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Why Read This Book? 


One of Euclid’s geometry students asked a familiar question more than 2000 years ago. 
After learning the first theorem, he asked, “What shall I get by learning these things?” 
Euclid didn’t have the kind of answer the student was looking for, so he did what anyone 
would do — he got annoyed and sarcastic. The story goes that he called his slave and said, 
“Give him threepence since he must make gain out of what he learns.”! 

It is a familiar question: “So how am I ever gonna use this stuff?” I doubt that anyone 
has ever come up with a good answer because it’s really the wrong question. The first 
question is not what you're going to do with this stuff, but what this stuff is going to do 
with you. 

This book is not a computer users’ manual that will make you into a computer 
industry millionaire. It is not a collection of tax law secrets that will save you thousands 
of dollars in taxes. It is not even a compilation of important mathematical results for you 
to stack on top of the other mathematics you have learned. Instead, it’s an entrance into 
a new kingdom, the world of mathematics, where you learn to think and write as the 
inhabitants do. 

Mathematics is a discipline that requires a certain type of thinking and communi- 
cating that many appreciate but few develop to a great degree. Developing these skills 
involves dissecting the components of mathematical language, analyzing their structure, 
and seeing how they fit together. Once you have become comfortable with these princi- 
ples, then your own style of mathematical writing can begin to shine through. 

Writing mathematics requires a precision that seems a little stifling because at first it 
might feel like some pedant is forcing you to use prechosen words and phrases to express 
the things you see clearly with your own mind’s eye. Be patient. In time you'll see how 
adapting to the culture of mathematics and adopting its style of communicating will 
shape all your thinking and writing. You'll see your skills of critical analysis become more 
developed and polished. My hope is that these skills will influence the way you organize 


'T.L. Heath, A History of Greek Mathematics, Oxford, 1931. 
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Why Read This Book? 


and present your thoughts in everything from English composition papers to late night 
bull sessions with friends. 

Here’s an analogy of what the first principles of this book will do for you. Consider 
a beginning student of the piano. Music is one of the most creative disciplines, and our 
piano student has been listening to Chopin for some time. She knows she has a true 
ear and an intuition for music. However, she must begin at the piano by playing scales 
over and over. These exercises develop her ability to use the piano effectively in order to 
express the creativity within her. Furthermore, these repetitive tasks familiarize her with 
the structure of music as an art form, and actually nurture and expand her capacity to 
express herself in original and creative ways through music. Then, once she has mastered 
the basic technical skills of hitting the keys, she understands more clearly how really 
enjoyable music can be. She learns this truth: The aesthetic elements of music cannot 
be fully realized until the technical skills developed by rote exercises have been mastered 
and can be relegated to the subconscious. 

Your first steps to becoming a mathematician are a lot like those for our pianist. You 
will first be introduced to the building blocks of mathematical structure, then practice 
the precision required to communicate mathematics correctly. The drills you perform in 
this practice will help you see mathematics as a discipline more clearly and equip you to 
appreciate its beauty. 

Let n be a positive integer, and think of this course as a trip through a new country 
on a bicycle built for n. The purposes of the trip are: 


To familiarize you with the territory; 

To equip you to explore it on your own; 

To give you some panoramic views of the countryside; 

To teach you to communicate with the inhabitants; and 

To help you begin to carve out your own niche. 
Ifyou are willing to do the work, I promise you will enjoy the trip. You'll need help pedaling 
at first, and occasionally when the hills are steep. But you'll come back a different person, 
for this material will have done something with you. Then you'll understand that Euclid 


really got it right after all, and you'll appreciate why his witty response is still fresh and 
relevant after 2000 years. 


Preface 


This text is written for a “transition course” in mathematics, where students learn to write 
proofs and communicate with a level of rigor necessary for success in their upper level 
mathematics courses. To achieve the primary goals of such a course, this text includes a 
study of the basic principles of logic, techniques of proof, and fundamental mathematical 
results and ideas (sets, functions, properties of real numbers), though it goes much further. 
It is based on two premises: The most important skill students can learn as they approach 
the cusp between lower- and upper-level courses is how to compose clear and accurate 
mathematical arguments; and they need more help in developing this skill than they 
would normally receive by diving into standard upper-level courses. By emphasizing 
how one writes mathematical prose, it is also designed to prepare students for the task 
of reading upper-level mathematics texts. Furthermore, it is my hope that transitioning 
students in this way gives them a view of the mathematical landscape and its beauty, 
thereby engaging them to take ownership of their pursuit of mathematics. 


Why this text? 


I believe students learn best by doing. In many mathematics courses it is difficult to 
find enough time for students to discover through their own efforts the mathematics we 
would lead them to find. However, I believe there is no other effective way for students 
to learn to write proofs. This text is written for them in a format that allows them to do 
precisely this. 

Two principles of this text are fundamental to its design as a tool whereby students 
learn by doing. First, it does not do too much for them. Proofs are included in this text 
for only two reasons. Most of them (especially at the beginning) are sample proofs that 
students can mimic as they write their own proofs to similar theorems. Students must 
read them because they will need this technique later. The other proofs are included here 
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because they are too difficult for students of this mathematics skill level to be expected 
to develop on their own. In most of these instances, however, some climactic detail is 
omitted and relegated to an exercise. 

Second, if students are going to learn by doing, they must be presented with doable 
tasks. This text is designed to be a sequence of stepping stones placed just the right 
distance apart. Moving from one stone to the next involves writing a proof. Seeing how 
to step there comes from reading the exposition and calls on the experience that led the 
student to the current stone. At first, stones are very close together, and there is much 
guidance. Progressing through the text, stones become increasingly farther apart, and 
some of the guidance might be either relegated to a footnote or omitted altogether. 

I have written this text with a very deliberate trajectory of style. It is conversational 
throughout, though the exposition becomes more sophisticated and succinct as students 
progress through the chapters. 


Organization 


This text is organized in the following way. Chapter 0 spells out all assumptions to be used 
in writing proofs. These are not necessarily standard axioms of mathematics, and they 
are not presented in the context or language of more abstract mathematical structures. 
They are designed merely to be a starting point for logical development, so that students 
appreciate quickly that everything we call on is either stated up front as an assumption, 
or proved from these assumptions. Although Chapter 0 contains much mathematical 
information, students can probably read it on their own as the course begins, knowing 
that it is there primarily as a reference. 

Part I begins with logic, but does not focus on it. In Chapter 1, truth tables and manip- 
ulation of logical symbols are included to give students an understanding of mathematical 
grammar, of the underlying skeletal structure of mathematical prose, and of equivalent 
ways of communicating the same mathematical idea. Chapters 2 and 3 put these to use 
right away in proof writing, and allow the students to cut their teeth on the most basic 
mathematical ideas. The context of topics in Chapters 2 and 3 is often rather specific, 
though certainly more broadly applicable. It is designed to ground the students in famil- 
iar territory first, then move into generalized structures later. Abstraction is the goal, not 
the beginning. 

Parts II and III are two completely independent paths, the former into analysis, the 
latter into algebra. Like Antoni Gaudi’s Sagrada Familia, the unfinished cathedral in 
Barcelona, Spain, where narrow spires rise from a foundation to give spectacular views, 
Parts II and III are purposefully designed to rest on the foundation of Part I and climb 
quickly into analysis or algebra. Many topics and specific results are omitted along the 
way, but Parts II and III rest securely on the foundation of Part I and allow students to 
continue to develop their skills at proof writing by climbing to a height where, I hope, 
they have a nice view of mathematics. 
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Flexibility 


This text can be used in a variety of ways. It is suitable for use in different class settings 
and there is much flexibility in the material one may choose to cover. 

First, because this text speaks directly to the student, it can naturally be used in 
a setting where students are given responsibility for the momentum of the class. It is 
written so that students can read the material on their own first, then bring to class 
the fruits of their work on the exercises, and present these to the instructor and each 
other for discussion and critique. If class time and size limit the practicality of such 
a student-driven approach, then certainly other approaches are possible. To illustrate, 
we may consider three components of a course’s activity, and arrange them in several 
ways. The components are: 1) the students’ reading of the material; 2) the instructor’s 
elaboration on the material; and 3) the students’ work on the exercises, either to be 
presented in class or turned in. When I teach from this text, component 1) is first, 3) 
follows on its heels, and 2) and 3) work in conjunction until a section is finished. Others 
might want to arrange these components in another order, for example, beginning with 
2), then following with 1) and 3). 

Which material an instructor would choose to cover will depend on the purpose of 
the course, personal taste, and how much time there is. Here are two broad options. 


1. To proceed quickly into either analysis or algebra, first cover the material from Part 
I that lays the foundation. Almost all sections and exercises of Part I are necessary 
for Parts II and III. However, the Instructor’s Guide and Solutions Manual notes 
precisely which sections, theorems, and exercises are necessary for each path, and 
which may be safely omitted without leaving any holes in the logical progression. 
Of course, even if a particular result is necessary later, one might decide that 
to omit its proof details does not deprive the students of a valuable learning 
experience. The instructor might choose simply to elaborate on how one would 
go about proving a certain theorem, then allow the students to use it as if they 
had proved it themselves. 


2. Cover Part I in its entirety, saving specific analysis and algebra topics for later 
courses. This option might be most realistic for courses of two or three units 
where all the Part I topics are required. Even with this approach, there would 
likely be time to cover the beginnings of Parts II and/or II. This might be the 
preferred choice for those who do not want to study analysis or algebra with the 
degree of depth and breadth characteristic of this text. 


This book would not have become a reality without the help of many people. First 
thanks go to Carolyn Vos Strache, Chair of the Natural Science Division at Pepperdine 
University, for providing me with resources as this project got off the ground. As it has 
taken shape, this project has come to bear the marks of many people whose meticulous 
dissection of several drafts inspired many suggestions for improvements. My thanks to 
colleagues at Pepperdine and from across the country for their hard work in helping 
shape this volume into its present form: 


Bradley Brock, Pepperdine University, Malibu, CA 
Julie Glass, California State University, Hayward, CA 
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Howard Hamilton, California State University, Sacramento, CA 
Jayne Ann Harder, University of Texas, Austin, TX 

Kevin Iga, Pepperdine University, Malibu, CA 

Irene Loomis, University of Tennessee, Chattanooga, TN 
Carlton J. Maxson, Texas A&M University, College Station, TX 
Bruce Mericle, Minnesota State University, Mankato, MN 
Kandasamy Muthuvel, University of Wisconsin, Oshkosh, WI 
Kamal Narang, University of Alaska, Anchorage, AK 

Travis Thompson, Harding University, Searcy, AR 

Steven Williams, Brigham Young University, Provo, UT 


I have written this book with the student foremost in mind. Many of my students have 
shaped this text from the beginning by their hard work in my class. Several students, 
both my own and from other universities across the country, have also made formal and 
useful suggestions. Their marks are indelible. 


Erik Baumgarten, Texas A&M University, College Station, TX 
Justin Greenough, University of Alaska, Anchorage, AK 
Reuben Hernandez, Pepperdine University, Malibu, CA 
Brian Hostetler, Virginia Tech, Blacksburg, VA 

Jennifer Kuske, Pepperdine University, Malibu, CA 


Finally, my deepest thanks go to Barbara Holland, senior editor at Harcourt/Academic 
Press, for making this text a reality. Her ability to read this manuscript through the eyes 
of the student has been one of my greatest encouragements. 


Notation and Assumptions 


First, you pour all the pieces out of the box. Then you sort through and turn them 

all face up, taking a quick look at each one to determine whether it’s an inside or 
outside piece, and you arrange them somehow so that you'll have an idea of where certain 
types of pieces can be found later. You don’t study each piece in depth, nor do you start 
trying to fit any of them together. In short, you just lay all the pieces out on the table and 
briefly familiarize yourself with each one. This is the point of the game where you merely 
set the stage, knowing that everything you'll need later has been put in a place where you 
can find it when you need it. 

In this introductory chapter we lay out all the pieces we will use for our work in this 
course. It’s essential that you read it now, in part because you need some preliminary 
exposure to the ideas, but mostly because you need to have spelled out precisely what 
you can use without proof in Part I, where this chapter will serve you as a reference. Give 
this chapter a casual but complete reading for now. You've probably seen most of the 
ideas before. But don’t try to remember it all, and certainly don’t expect to understand 
everything either. That’s not the point. Right now, we’re just organizing the pieces. The two 
issues we address in this chapter are: 1) set terminology and notation; and 2) assumptions 
about the real numbers. 


S uppose you've just opened a new jigsaw puzzle. What are the first things you do? 


0.1 Set Terminology and Notation 


CHAPTER 


0 


Sets are perhaps the most fundamental mathematical entity. Intuitively we think of a set 
as a collection of things, where the collection itself is thought of as a single entity. Sets 
may contain numbers, points in the xy-plane, functions, ice cream cones, steak knives, 
worms, even other sets. We will denote many of our sets with uppercase letters (A, B, C), 


Chapter0 Notation and Assumptions 


or sometimes with scripted letters (F, , J). First we need a way of stating whether a 
certain thing is or is not in a set. 


Definition 0.1.1. If Ais aset and x is an entity in A, we write x € A, and say that x is an 
element of A. Towrite x ¢ A is to mean that x is not an element of A. 


How can you communicate to someone what the elements of a set are? There are 
several ways. 


1. List them. If there are only a few elements in the set, you can easily list them all. 
Otherwise, you might start listing the elements and hope that the reader can take the 
hint and figure out the pattern. For example, 


(a) {1, 8, 2, Monday} 
(b) {0, 1, 2,..., 40} 
(c) {..., -—6, —4, —2, 0, 2,4, 6,...} 


2. Provide a description of the criteria used to decide whether an entity is to be included. 
This is how it works: 


(a) {x : x isa real number and x > —1} This notation should be read “the set of all 
x such that x is a real number and x is greater than — 1.” The variable x is just 
an arbitrary symbol chosen to represent a random element of the set, so that 
any characteristics it must have can be stated in terms of that symbol. 


(b) {p/q : p and gq are integers and gq 4 0} This assumes you know what an integer 
is (p. 5). This is the set of all fractions, integer over integer, where it is expressly 
stated that the denominator cannot be zero. 


(c) {x : P(x)} This is a generic form for this way of describing a set. The expression 
P(x) represents some specified property that x must have in order to be in the set. 


When addressing the elements of a set, or more important, when addressing all 
things not in a particular set, we must have some universal limiting parameters in mind. 
Although it might not be explicitly stated, we generally consider that there is some 
universal set beyond which we do not concern ourselves. Then, if we’re talking about 
everything not in a set A, we know how far to cast the net. It is limited by our universal 
set, typically denoted U. 

To help visualize sets and how they compare and combine, we sometimes sketch what 
is called a Venn diagram. Given a set A within a universal set U, we may sketch a Venn 
diagram as in Fig. 0.1. 

Given two sets A and B, it just might happen that all elements of A are also elements 
of B. We write this as A C B and say that A is a subset of B. Equivalently, we may write 
B D> A, and say that B is a superset of A. If A is a subset of B, but there are elements 
of B that are not in A, we say that A is a proper subset of B, and write this A C B. The 
relationship A C B can be displayed in the Venn diagram in Fig. 0.2. The region outside 
A but inside B may or may not have any elements. 

Given two arbitrary sets A and B, the standard Venn diagram is like the one in 
Fig. 0.3. Unless we know some special characteristics of A and B, we'll draw most of our 
Venn diagrams in this way. 


0.1 Set Terminology and Notation 


Figure 0.1 A basic Venn diagram. 


Figure 0.2 Venndiagram with A C B. 


Figure 0.3 Astandard Venn diagram with two sets. 
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Venn diagrams are handy for visualizing new sets formed from old ones. Here are a 
couple of examples. 


Definition 0.1.2. Givena set A, the set A’ is called the complement of A, and is defined 
as the set of all elements of U that are not in A (Fig. 0.4). That is, 


A’ ={x:xeUandx ¢ A} (0.1) 


Figure0.4 Shaded region represents A’. 


Definition 0.1.3. Given two sets A and B, we define their union U and intersection M in 
the following way: 


AU B={x:xeAorxe B} (0.2) 
AN B={x:xeAandx e€ B} (0.3) 


An entity is allowed to be in A U B precisely when it is in at least one of A, B. An entity is 
allowed to be in A B precisely when it is in both A and B. See Figs. 0.5 and 0.6. 


U 


Figure 0.5 Shaded region represents A U B. 


0.2 Assumptions 


Figure 0.6 Shaded region represents AM B. 


Finally, we provide the notation for commonly used sets. Famous sets we need to 
know include the following: 


Empty set: = {} (the set with no elements) 
Natural numbers: N = {1, 2,3, ...} 

Whole numbers: W = {0, 1, 2,3, ...} 

Integers: Z = {..., —3, —2, —1,0, 1, 2,3,...} 
Rational numbers: Q = {p/q : p,q € Z,q #0} 


Real numbers: R (Explained in what follows) 


0.2 Assumptions 


One big question we’ll face at the outset of this course is what we are allowed to assume 
and what we must justify with proof. The purpose of this section is to provide a framework 
for the way you work with real numbers, spelling out the properties you may assume and 
reminding you of how to visualize them. 


0.2.1 Basic algebraic properties of real numbers 


as 
The real numbers R, as well its familiar subsets N, W, Z, and Q, are assumed to be 
endowed with the relation of equality and the operations of addition and multiplication, 
and to have the following properties. First, equality is assumed to behave in the following 
way: 

(A1) Properties of equality: 


(a) Foreverya ¢ R,a=a (Reflexive property); 
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(b) Ifa =b,thenb=a (Symmetric property); 
(c) Ifa =bandb=c,thena=c (Transitive property). | 


The first property of addition we assume concerns its predictable behavior, even when 
the numbers involved can be addressed by more than one name. For example, 3/8 and 
6/16 are different names for the same number. We need to know that adding something 
to 3/8 will always produce the same result as adding it to 6/16. The following property 
is our way of stating this assumption. 


(A2) Addition is well defined: That is, if a,b,c,d € R, where a = b andc = d, 
thena+c=b+d. | 


A special case of property A2 yields a familiar principle that goes all the way back to 
your first days of high school algebra: If a=), then, since c=c, we have thata+c= 
b+e. 


(A3) Closure property of addition: For every a,b € R,a +b € R. That is, the 
sum of two real numbers is still a real number. This closure property also holds for 
N, W, Z, and Q. | 


(A4) Associative property of addition: For every a, b,c € R, 
(a+b)+c=a+(b+c) | 


Addition is what we call a binary operation, meaning it combines exactly two numbers 
to produce a single number result. If we have three numbers a, b, and c to add up, we 
must split the task into two steps of adding two numbers. Property A4 says it doesn’t 
matter which two, a and Bb, or b and c, we add first. It motivates us to use the more lax 
notationa+b-+c. 


(A5) Commutative property of addition: Foreverya,b€R,a+b=b+a. @ 


If youre not careful, you'll tend to assume order does not matter when two things are 
combined in a binary operation. There are plenty of situations where order does matter, 
as we'll see. 


(A6) Existence of an additive identity: There exists an element 0 €R with the 
property thata + 0 =a foreverya € R. 7 


(A7) Existence of additive inverses: For everya € R, there exists some b € R such 
that a + b = 0. Such an element b is called an additive inverse of a, and is typically 
denoted —a to show its relationship to a. We do not assume that only one such b 
exists. a 


Properties similar to A2—-A7 hold for multiplication. 


(A8) Multiplication is well defined: That is, if a,b,c,d€IR, where a = b and 
c = d,thenac = bd. | 


(A9) Closure property of multiplication: For alla, b € R,a-b € R. The closure 
property of multiplication also holds for N, W, Z, and Q. | 
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(A10) Associative property of multiplication: For every a,b,c € R, (a: b)-c= 
a-(b-c) or (ab)c = a(bc). | 


(All) Commutative property of multiplication: For everya,b € R,ab = ba. @ 


(A12) Existence of a multiplicative identity: There exists an element 1 € R with 
the property that a- 1 =a for everya € R. a 


(A13) Existence of multiplicative inverses: For every a € R except a = 0, there 
exists some b € R such that ab= 1. Such an element D is called a multiplicative 
inverse of a and is typically denoted a~! to showits relationship to a. As with additive 
inverses, we do not assume that only one such D exists. Furthermore, the assumption 
that a~! exists for all a 4 0 does not assume that zero does not have a multiplicative 
inverse. It says nothing about zero at all. 7 


The next property describes how addition and multiplication interact. 


(A14) Distributive property of multiplication over addition: For every a,b, 
c € R,a(b+c) = (ab) + (ac) = ab +.ac, where the multiplication is assumed to 
be done before addition in the absence of parentheses. a 


Property A14 is important because it’s the only link between the operations of ad- 
dition and multiplication. Several important properties of IR owe their existence to this 
relationship. For example, as we'll see later, the fact that a -0 = 0 for everya € Risa 
direct result of the distributive property, and not something we simply assume. 

From addition and multiplication we create the operations of subtraction and 
division, respectively. Knowing that additive and multiplicative inverses exist (except 
for 0~'), we write 


a—b=a+(-b) (0.4) 
a/b=a-b"! (0.5) 


One very important assumption we need concerns properties A6 and A12. For rea- 
sons you'll see later, we need to assume that the additive identity is different from the 
multiplicative identity. That is, we need the assumption 


(Al5) 140. | 


We'll use these very basic properties to derive some other familiar properties of real 
numbers in Chapter 2. 


.2.2 Ordering of real numbers 
 ————ii| 
One standard way of comparing two real numbers is with the greater than symbol >. 
Intuitively, you think of the statement a > b as meaning that a is to the right of b on the 
number line. Although this is helpful, the comparison a > b is actually a bit sticky. The 
nuts and bolts of > are contained in the following. In Al6, we make an assumption about 
how all real numbers compare to zero by >, thus giving meaning to the terms positive 
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and negative. Then in Al7 and A18, we make some assumptions about how the positive 
real numbers behave. 


(A16) Trichotomy law: For anya € R, exactly one of the following is true: 


(a) a > 0, in which case we say a is positive; 
(b) a = 0; 


(c) 0 > a, in which case we say a is negative. 
a 


(Al7) Ifa > Oandb > 0, thena + b > O. That is, the set of positive real numbers 
is closed under addition. | 


(A1l8) Ifa > Oandb > 0, then ab > 0. That is, the set of positive real numbers is 
closed under multiplication. a 


Now we can use A16—A18 to give meaning to other statements comparing two arbi- 
trary real numbers a and b. 


Definition 0.2.1. If a,b € R, wesaythata > bifa —b > O.Thestatementa < b 
means b > a. Thestatementa > b means that eithera > bora = b. Similarly, a < b 
means eithera < bora=b. 


The rest of the properties of real numbers are probably not as familiar as the preceding 
ones, but their roles in the theory of real numbers will be clarified in good time. As with 
the above properties, we do not try to justify them. We merely accept them and use 
them as a basis for proofs. A very important property of the whole numbers is the 
following. 


(A19) Well-ordering principle: Any nonempty subset of W (or N for that matter) 
has a smallest element. That is, if A C W (or A C N), and A is nonempty, then there 
is some number a € A with the property that a < x for all x € A. In particular, 1 is 
the smallest natural number. a 


The next property of R is a bit complicated, but is indispensable in the theory of real 
numbers. Read it casually the first time, but know that it will be very important in Part II 
of this text. Suppose A C R is a nonempty set with the property that it is bounded from 
above. That is, suppose there is some M € R with the property thata < M foralla € A. 
For example, let A = {x : x7 < 10}. Clearly M = 4 is a number such that every a € A 
satisfies a < M. So 4 is an upper bound for the set A. There are other upper bounds for 
A, such as 10, 3.3, and 3.17. The point to be made here is that, among all upper bounds 
that exist for a set, there is an upper bound that is smallest, and it exists in R. This is 
stated in the following. 


(A20) Least upper bound property of R: If A is a nonempty subset of R that is 
bounded from above, then there exists a least upper bound in R. That is, if there 
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exists some M ¢€ R with the property that a < M for alla ¢€ A, then there will also 
exist some L € R with the following properties: 

(L1) For every a € A, we have that a < L, and 

(L2) If N is any upper bound for A, it must be that N > L. | 


.2.3 Other assumptions about IR 


The real numbers are indeed a complicated set. The final two properties of R we mention 
are not standard assumptions, and they deserve your attention at some point in your 
mathematical career. In this text, we assume them. 


(A21) The real numbers can be equated with the set of all base 10 decimal represen- 


tations. That is, every real number can be written in a form like 338.1898... , where 
the decimal might or might not terminate, and might or might not fall into a pattern 
of repetition. 


Furthermore, every decimal form you can construct represents a real number. 
Strangely, though, there might be more than one decimal representation for a certain 


real number. You might remember that 0.9999... = 1. (See Section 2.5.1.) The 
repeating 9 is the only case where more than one decimal representation is possible. 
We'll assume this. a 


Our final assumption concerns the existence of roots of real numbers. 


(A22) For every positive real number x and any n € N, there exists a real number 
solution y to the equation y” = x. Such a solution y is called an nth root of x. The 
common notation ¥/x will be addressed in Section 2.8. | 


Notice we make no assumptions about how many such roots of x there are, or 
what their signs are. Nor do we assume anything about roots of zero or of negative real 
numbers. We'll derive these from assumption A22. 

One final comment about assumptions in mathematics. In a rigorous development of 
any mathematical theory, some things must be assumed without proof, that is, they must 
be axiomatic, serving as an agreed-on starting place for the mathematician’s thinking. 
In a study of the real numbers, some of the assumptions Al—A22 are standard. Others 
would be considered standard assumptions only for some subsets of R, perhaps for W. 
The mathematician would then very painstakingly apply assumptions made to W in 
order to expand the same properties to all of R. One assumption in particular, A21, is 
a most presumptuous one. But let us make no apologies for this. After all, many of the 
foundational issues in mathematics were addressed very late historically, and this is not a 
course in the foundations of mathematics. It is a course to teach us how mathematics is 
done, and to give us some enjoyment of that process. We choose assumptions here that 
likely coincide with your current idea of a reasonable place to start. In some cases, we'll 
dig more deeply as we go, though some of the foundational work will come in your later 
courses. 
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1.1 Introduction to Logic 


CHAPTER 


l 


Mathematicians make as much use of language as anyone else. Not only do they com- 
municate their mathematical work with language, but the use of language is itself part of 
the mathematical structure. In this chapter, we lay out some of the principles that govern 
the mathematician’s use of language. 


1.1.1 Statements 


The first issue we address is what kinds of sentences mathematicians use as building 
blocks for their work. Remember from elementary school grammar that sentences are 
generally divided into four classes: 


Declarative sentences: We also call these statements. Here are some examples: 
1. Labor Day is the first Monday in September. 
2. Earthquakes don’t happen in California. 
3. Three is greater than seven. 
4. The world will end on June 6, 2040. 
5. The sky is falling. 


One characteristic of statements that jumps out at you is that they generally evoke 
a reaction such as, “Yeah, that’s true,” or “No way,” or even “That could be true, but I 
don’t know for sure.” Statement 1 is true, statements 2 and 3 are false, while statements 4 
and 5 are uncertain. We cannot know whether statement 4 is true, but we can say that 
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its truth or falsity will be revealed. Statement 5, however, is curious. If you were going 
to investigate the truth or falsity of this statement, you would immediately be faced with 
the problem of what the terms mean. How do we define “the sky,” and what precisely 
does it mean to say that it “is falling”? 


Imperative sentences: We would call these commands. 
1. Don’t wash your red bathrobe with your white underwear. 


2. Knock three times on the ceiling if you want me. 


Interrogative sentences: That is, questions. 
1. How much is that doggy in the window? 


2. Have you always been that ugly? 


Exclamations: 
1. What a day! 
2. Lions and tigers and bears, Oh My! 


3. So, like, whatever. 


The mathematician’s work centers around the first category, but we have to be care- 
ful about exactly which declarative sentences we allow. We will define a statement in- 
tuitively as a sentence that can be assigned either to the class of things we would call 
TRUE or to the class of things we would call FALSE. Let’s conduct a little thought 
experiment. 

Imagine the set of all conceivable statements, and call it $. Naturally, this set is 
frighteningly large and complex, but a most important characteristic of its elements is 
that each one can be placed into exactly one of two subsets: J (statements called TRUE), 
and F (statements called FALSE). Sometimes you might have trouble recognizing whether 
a sentence even belongs in 8. There are, after all, things called paradoxes. For example, 
“This sentence is false,” which cannot be either true or false. If you think the sentence is 
true, then it is false. However, if it is false, then it is true. We don’t want to allow these 
types of sentences in 8. 

Now that we have the set of all statements partitioned into J and F, the true and 
false ones, respectively, we want to look at relationships between them. Specifically, we 
want to pick statements from $, change or combine them to make other statements in 
§, and lay out some understandings of how the truth or falsity of the chosen statements 
determines the truth or falsity of the alterations and combinations. In the next part of 
this section, we discuss three ways of doing this: 


¢ the negation of a statement; 
¢ acompound statement formed by joining two given statements with AND; and 


¢ acompound statement formed by joining two given statements with OR. 
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1.1.2 Negation of a statement 


We generally use p, g,r, and so forth, to represent statements symbolically. For example, 
define a statement p as follows: 


p: Megan has rented a car for today. 


Now consider the negation or denial of p, which we can create by a strategic placement 
of the word NOT somewhere in the statement. We write it this way: 


ap: Megan has not rented a car for today. 


Naturally, if p is true, then —p is false, and vice versa. We illustrate this in a truth 
table (‘Table 1.1). 


7p 

F 

F| T 
Table 1.1 


Definition 1.1.1. Given a statement p, we define the statement —p (not p) to be false 
when p is true, and true when p is false, as illustrated in Table 1.1. 


1.1.3 Combining statements with AND/OR 


When two statements are joined by AND or OR to produce a compound statement, we 
need a way of deciding whether the compound statement is true or false based on the 
truth or falsity of its component statements. Let’s build these with an example. Define 
statements p and q as follows: 


p: Megan is at least 25 years old. 
q: Megan has a valid driver’s license. 


Now let’s create the statement we call “p and q,” which we write as 


Dp Aq: Megan is at least 25 years old, and she has a 
valid driver’s license. 


If you know the truth or falsity of p and g individually, how would you be inclined 
to categorize p \ q?! Naturally, the only way that we would consider p A q to be true is if 
both p and q are true. In any other instance, we would say p A q is false. Thus, whether 


'Pretend Megan is standing at a car rental counter and must answer yes or no to the question “Are you 
at least 25 years old and have a valid driver’s license?” 
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Table 1.2 


Pp Aq isin 7 or F depends on whether p and q are in J or F individually. We illustrate 
the results of all different combinations in Truth Table 1.2. Notice how the truth table is 
constructed with four rows, systematically displaying all possible combinations of T and 
F for p and q. 


Definition 1.1.2. Given two statements p and q, we define the statement pA g (pandq) 
to be true precisely when both p and q are true, as illustrated in Table 1.2. 


Now let’s join two statements with OR. Define statements p and qg by 
p: Megan has insurance that covers her for any car she drives. 


q : Megan bought the optional insurance provided by the car 
rental company. 


The compound statement we call “p or q” is written: 


pV q: Megan has insurance that covers her for any car she drives, 
or she bought the optional insurance provided by the 
car rental company. 


How should we assign T or F to p V qg based on the truth or falsity of p and 
q individually?? We define p V q to be true if at least one of p,q is true. See Truth 
Table 1.3. 


Mad Hales 
maHonwHslis 
mHorHxrt < 


Table 1.3 


2Pretend Megan’s friend, who is worried about being covered in case of an accident, asks her “Do you 
have your own insurance, or did you buy the optional coverage provided by the rental company?” 
Under what circumstances should she say yes? 
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Definition 1.1.3. Given two statements p and g, we define the statement p V g (porgq) 
to be true precisely when at least one of p and gq is true, as illustrated in Table 1.3. 


This scenario illustrates why we consider p V q to be true even when both p and q 
are true. We are just as happy if Megan is doubly covered by insurance rather than by 
one policy only. Because our conversational language is sometimes ambiguous when we 
use the word OR, we distinguish between two types of OR compound statements. We 
call V the inclusive OR. If you want to discuss an OR statement where you mean p org but 
not both, use the exclusive OR, pVq. See Truth Table 1.4. Although we'll not use this very 
often, it is a nice term to have. Just remember that use of the word OR in mathematical 
writing always means inclusive OR. For example, when we say xy = 0 implies that either 
x = Oor y = O, we include the possibility that both x and y are zero. 


P| 4@ | pva 
| Ari 
ap (Re) 
Bee || oF 
FIE| F 


Table 1.4 


Now we can build all kinds of compound statements. 


Example 1.1.4. Construct truth tables for the statements 


l. (pAqQ)V (=p A7@), 
2. pA Vr), 
3. (PAG) V (PAP). 


Solution: See Table 1.5 for part 1 and Table 1.6 for parts 2 and 3 of Example 1.1.4. 
Some of the intermediate details are left to you in Exercise 1. Notice how we set up 
the p,q, andr columns for more than two given statements. 


P|4|7P | 7q | PAG | 7PA7@G | (PAQYV (HP An7g) 
T|T|] FE] FE iT EF ue 
T|F|] FE | T EF EF EF 
F/|T| T | F EF EF EF 
F/E| T | T EF iT T 


Table 1.5 Solution to Example 1.1.4, part 1 
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P|a\|rjavr}| pA@yvr)| prq] par | (pPAgv(PAr) 
|e Sn Vs | i T rE T 
1 ie | a | T ny F T 
eS) ee al | aF iT T 
T|E|E EF EF EF 
E|T|T EF F 
E| TIE EF F 
E|E|T EF EF 
EF] EE EF F 


Table 1.6 Solutions to Example 1.1.4, parts 2 and 3 


1.1.4 Logical equivalence 

EE 

There are often several ways to say the same thing. We need to address the situation where 
two different constructs involving statements should be interpreted as having the same 
meaning, or as being logically equivalent. As a trivial example, consider that p A gq should 
certainly be viewed as having the same meaning as q A p. To build a truth table would 
produce identical columns for p A q andgq A p. This is the way we define our use of the 
term logical equivalence. 


Definition 1.1.5. Two statements are said to be logically equivalent if they have precisely 
the same truth table values. If p and g are logically equivalent, we write p <> gq. 


Look at parts 2 and 3 of Example 1.1.4. To illustrate the reasonableness of declaring 
Dp“ (q¢ Vr) logically equivalent to (p Aq) V (p Ar), consider the following criteria for 
being allowed to rent a car: 


p: Megan has a valid driver’s license. 
q : Megan has her own insurance policy. 
r: Megan bought the rental company’s insurance coverage. 


Notice that saying “p AND either g or r” has the same meaning to us as “p and g, OR 
p and r.” This is a sort of distributive property; that is, A distributes over V, exactly 
like multiplication distributes over addition in real numbers. Exercise 5 asks you to 
demonstrate that V also distributes over A. 


1.1.5 Tautologies and contradictions 


Sometimes a truth table column just happens to have TRUE values all the way down; 
for example, p V —p. A statement such as “Either Megan has a valid driver’s license, or 
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she does not” would make you think, “Of course!” or “Naturally this is a true statement 
regardless of the circumstances.” A statement whose truth table values are all TRUE is 
called a tautology. Youll do the following in Exercise 6. 


Example 1.1.6. Show that =(p A q) V (p V q) isa tautology. 


A statement such as the one in Example 1.1.6 would be very confusing if expressed 
in English form. It would read something like, 


Either it is not true that Dave has brown hair and green eyes, or he 
has either brown hair or green eyes. 


One last item. The negation of a tautology is called a contradiction. The truth table 
values of a contradiction are all FALSE. In the same way that a tautology is the kind of 
statement that makes you think “Of course!,” a contradiction makes you think “No way 
can that ever be true!” A really easy example of a contradiction is p A —p. Since p and 
—p cannot ever both be true, p A —p is always false. Tautologies and contradictions are 
very useful, as we will begin to see in Section 2.1. 


EXERCISES 


1. Construct truth tables for the following statements: 


(a) pV @vr) 
(b) PY Qvr 
(c) PAV Tr) 
(d) (PAQV (PAT) 
2. Which of the statements in Exercise 1 are logically equivalent? 
3. Parts (a) and (b) of Exercise 1 show that Vv has the associative property. We can 
therefore allow ourselves the freedom to write p V g V r and understand it to mean 


either (p V gq) Vr or p V (¢ Vr). Does A have the associative property? Verify your 
answer with a truth table. 


4. Below are several logical equivalences that are called DeMorgan’s laws (a name you'll 
want to remember). Verify these forms of DeMorgan’s laws with truth tables: 


(a) -(pAg)s =pVn7q 
(b) -(PYV Qs =pAn7q 
(c) -(pPAGAr); =pV-7qv mr 
(d) -(PYgvr) =pA7qArr 
5. Show that Vv distributes over A. 
6. Show that —(p A q) V (p V q) from Example 1.1.6 is a tautology. 


7. Construct a statement using only p, g, A, V, and — that is logically equivalent to pvq. 
Demonstrate logical equivalence with a truth table. 
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8. Use DeMorgan’s laws from Exercise 4 as a basis for symbolic substitution and manip- 
ulation to show that —[(p V g) A r] is logically equivalent to (—p A -qg) V =r by 
transforming the former statement into the latter. Use a similar technique to construct 
a statement that is logically equivalent to =[p Vv (¢ Ar)]. 


— 1.2 If-Then Statements 


In this section, we want to do two things: 1) Consider the logical structure of the statement 
that p implies or necessitates q and its variations; and 2) return to the idea of logical 
equivalence and its connection to tautologies. We set the stage with a classic (albeit tired) 
example. 

In my junior high school days, there was a man named Mr. Shephard who would 
stand in the middle of the street in front of my school every afternoon and flag down cars 
to try to get a ride to the post office. What made this dangerous stunt notable was the 
way Mr. Shephard often dressed. If there was even a single cloud in the sky, Mr. S would 
certainly be wearing his raincoat and carrying his umbrella. However, there were also 
days without a cloud in the sky on which Mr. S would be wearing his rain attire. When I 
think back about that period of my life, I am struck with the following fact: On every day 
that there were any clouds, Mr. S was undoubtedly dressed for rain. Granted, there might 
have been some clear days that he also dressed for rain, but I am not referring to this 
possibility in my claim. I’m only making an observation about something that happened 
on cloudy days. 


1.2.1 If-then statements 


From this little story, we want first to analyze the correlation between the weather and 
Mr. Shephard’s attire. Specifically, we want to define logically with a truth table what we 
mean by a statement such as “If it is a cloudy day, then Mr. S wears his rain gear.” In 
general, we consider the statement “If p, then g,” which we write as p > q. 

Let’s isolate one particular day, say day 1. Define statements 


pi: Day 1 isa cloudy day. 
qi: Mr. S wears his rain gear on day 1. 


Pretend for the moment that words like IF, THEN, and IMPLIES are not in your 
vocabulary. How can we piece together a logical statement using only p;, qi, A, V, and 
— that has the same sense as p; — q1?° There are several possible answers. Here are two. 
Read them as sentences to understand their meaning. 


“PV (PiAQ) (1.1) 
7PiV dG (1.2) 


3One way is to consider the two different weather possibilities and link one of them with 
Mr. Shephard’s expected behavior. 


1.2 If-Then Statements 


Statements 1.1 and 1.2 are logically equivalent, but 1.2 is simpler, so let’s use it as our 
definition of p > q. We build its truth table: 


Definition 1.2.1. The statement p — g (read “If p, then q” or “p implies q”) is defined 
to be a statement that is logically equivalent to —p V g, as illustrated in Table 1.7. We call p 
the hypothesis condition and g the conclusion. 


Pla | mpi Va (> pi > 41) 
T|T T 

T | F F 

FIT T 

F/ F T 

Table 1.7 


Notice from the definition that constructing a truth table for — produces FALSE 
only when the hypothesis condition is true and the conclusion is false. 


Example 1.2.2. Construct truth tables for the following statements. 


l. p> 7q 
2,.(pAqg>r 


Solution: See Tables 1.8 and 1.9. 


P\|@Q|r | pAqd|@PAg-ar 
T|T|T T T 
T|T|F T F 
T| FT F T 
P\q|~q | p> 4 T|EIE E T 
T|T] F F Fy) T| T F T 
T| FY] T T Fy) T]F F T 
F/T] F T Fy) F| T F T 
F/F| T T F)F/F F T 
Table 1.8 Solution to Table 1.9 Solution to Example 1.2.2, 
Example 1.2.2, part 1 part 2 
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1.2.2 Variations on p — q 


Given two statements p and q, we might want to analyze other possible correlations 
between their truth besides p —> q. Defining: 


p: Itisa cloudy day, 
q: Mtr. S dresses for rain, 
we might address the following variations of p > q. 


q — p: IfMtr.S dresses for rain, then it is a cloudy day. 
(Converse) 


ap — 7q : Ifit is not a cloudy day, then Mr. S does not dress for rain. 
(Inverse) 


=q > —p: IfMr. S does not dress for rain, then it is not a cloudy day. 
(Contrapositive) 
Example 1.2.3. Which, if any, of the statements p—g, q— p, ~p—-—g, and 


aq —> =p are logically equivalent? 


Solution: We construct a truth table (see Table 1.10). 


P|\|@d| "P| 74 |P->4|a~> Pp MP AG. i AP. 
T| T | F F T T T T 
T| FI F T F T T F 
F/T] T F T F F T 
F/ FY] T T T T T T 


Table 1.10 


Notice that the original statement p — q is logically equivalent to its contrapos- 
itive -q — —p, and that the converse g — p is logically equivalent to the inverse 
“p> 7q. a 


There is one last construct involving if-then. Sometimes we want to consider the 
statement that we would call true precisely when p and q are either both true or both 
false. For example, if Mr. S had been firing on all cylinders we could have observed the 
following: Mr. S dressed for rain if it were cloudy, and if he dressed for rain, then it was 
a cloudy day. That is, he would have dressed for rain if and only if it were cloudy. 


Definition 1.2.4. Given statements p and q, the statement p < q (read “p if and only if 
q,” and often written “p iff q”) is defined to be true precisely when p and q are either both 
true or both false. 


1.2 If-Then Statements 


How can you use >, A, and V on p and q to construct a statement that is logically 
equivalent to p < q?4 One answer is in the next example, and you'll create others in 
Exercise 2. 


Example 1.2.5. Show that (p > qg) A (¢ > p) is logically equivalent to p = q. 


Solution: See Table 1.11. 


P\q|Ppeqd|Pp>4qd/|4>P|P7Q)AG> P) 
calli i sh T T 
T|F F F 1 F 
E | T F T FE F 
F|E T 1. T T 


Table 1.11 


1.2.3 Logical equivalence and tautologies 

Ea 

In Section 1.1, we defined two statements to be logically equivalent if they have exactly 
the same truth table values. With the definition of p < q, we can now offer an alternate 
definition of logical equivalence. 


Definition 1.2.6. Twostatements p andg are logically equivalent if the statement p <> q 
is a tautology. 


As we noted in Section 1.1.4, the significance of statements being logically equivalent 
is that they are different ways of saying precisely the same thing. If p is logically equivalent 
to q, then knowing p is true guarantees that g is true, and vice versa. 

If p = q isa tautology, what does that say about p > g and q —> p separately?> If 
p <= q isa tautology, then p + q andq — p must both be tautologies, too. Loosely 
speaking, the truth of p is sufficiently strong to imply the truth of g and vice versa. That 
is, in any case where p is true, it can also be noted that g will, without exception, be true, 
and vice versa. 


Example 1.2.7. Show that p > q and ~q — —p are logically equivalent using 
Definition 1.2.6. 


Solution: We already know that the truth table columns for p > q and =q — =p 
are identical, but we are asked to use Definition 1.2.6. Therefore, we construct 


4Take a hint from the sentence, “For example, if Mr. S had been firing on all cylinders ....” 
>Remember p < q is defined by (p > q) A (q > Pp). 
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P\q\|rpilrq|/U\|V|UuseVIVoU}UsSVYAW AU) 
T|T | F&F F T | T T T T 
T|F F T F | F T T T 
F/T] T F T | T T T T 
FI/FI] T T | T/T T T T 
Table 1.12 


(p > q) <@ (-q > =p) and show that it is a tautology. Writing p > q as U 
and -—qg — -—p as V, the details are displayed in Table 1.12. Since the last col- 
umn of Table 1.12 is a tautology, U and V are logically equivalent. Notice that the 
columns U — V and V — U are tautologies to make this last column a tautology. 

3 


Now let’s consider the situation where p < q is not a tautology, but one of p > q 
org —> pis. If p > q isa tautology while g — p is not, then we say that p is a stronger 
statement than q. This means that the truth of p necessitates the truth of g, but the truth 
of g is not necessarily accompanied by the truth of p. 


Example 1.2.8. Which statement is stronger, p or p A q? Verify with a truth table (see 
Table 1.13). 


P|9|PAQ|PAQ—>P| p> (Pag 
T|T T T T 

T | F F T F 
E|T F T T 
F|F F T T 
Table 1.13 


Solution: Since (p A gq) > p isa tautology while p > (p A q) is not, p A q is 
stronger than p. Knowing p / q is true guarantees that p is true, but knowing p is 
true does not guarantee that p / q is true. | 


Example 1.2.9. Which statement do you think is stronger, (p A g) > r or p > r? 
Determine for sure with a truth table. 


Solution: Writing (p A q) > ras U and p —> ras V, we need truth table values 
for U — V and V — U. (See Table 1.14.) Since V > U isa tautology and U > V 
is not, V is stronger than U. 


1.2 If-Then Statements 


= 
at 
q 


\ 
q 


q|U:(pAqgror|V: 


v 


dom mt et ma HH] > 


T 


Domo AH Hd 
Moar tH HIS 
DHA mH DH TH 
HAHAH HHS 

Hae e wre tr H|? 
ee 
Haw eee HI I 


Table 1.14 
|| 


Notice this important fact. Statements U and V from Example 1.2.9 have the same 
conclusion, but the hypothesis condition for U (p A q) is stronger than the hypothesis 
condition for V (p). However, U is a weaker statement than V. Exercise 4 asks you to 
investigate and explain from the truth table why an if-then statement is weakened when 
the hypothesis condition is strengthened. 

What is the significance of one statement being stronger than another? Here is an 
example. Define the following statements: 


p: Megan is at least 25 years old. 
q: Megan has a valid driver’s license. 


r: Megan is allowed to rent a car. 


What does it mean to say that V (p — r)isstronger than U ((p Aq) > r)?® Statement U 
says that age and a license will guarantee your eligibility to rent a car. Statement V says 
that age alone is sufficient to be eligible. Thus, if everyone at least 25 years old can rent 
a car, then certainly everyone at least 25 with a license can, too. Thus if V is true, so 
is U. On the other hand, just because licensed people at least 25 years old can rent a 
car, it does not follow that all people over 25 can do the same. That is, U does not 
imply V. 


Example 1.2.10. Without justifying by proof, state whether the following statements 
are logically equivalent, whether one is stronger than the other, or neither. 


1. p: x isan integer that is divisible by 6. 
q: x isan integer that is divisible by 3. 
2. p: x3 -4x?+4x =0 
q: x € {0,2, 4} 


Tf age is sufficient for being allowed to rent a car, what does that say about age and a driver’s license? 
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a p:—2<x<2 
ee ae | 
Solution: 

1. If x is divisible by 6, then it is divisible by both 2 and 3. That is, divisibility by 6 
necessitates divisibility by 3. However, 9 is a multiple of 3 that is not divisible 
by 6, so that divisibility by 3 does not necessitate divisibility by 6. Therefore, p is 
stronger than q. 

2. Factoring and solving the equation in p yields x = 0 or x = 2. Thus, ifx = O or 
x = 2, then x € {0, 2, 4}. But just because x € {0, 2, 4}, it does not follow that 
x? — 4x? + 4x = 0. Thus p is stronger than q. 

3. A particular value of x will either make p and q both true or both false. Thus, 
they are logically equivalent. : 

EXERCISES 


1. The following sentences are alternative ways of expressing an idea of the form p > q 
or p <> q. For each, define statements p and g and note whether p > q or p > q 
conveys the same sense as the sentence. 


(a) 


On a clear day, you can see forever. 

I get nervous whenever you look at me that way. 

Every time I turn my back, you sneak away. 

Only fools fall in love. 

There are no refunds on sale items. 

Every time a bell rings, an angel gets his wings. 

ll wait for you at home, unless you call, in which case Ill leave. 


The only solutions to the equation x” 


— x = Oare nonnegative. 

It only hurts when I laugh.’ 

Unless you follow the instructions, the cake won’t turn out right. (Careful: Just be- 
cause you follow the instructions doesn’t guarantee success. A sudden earthquake 
could cause it to fall.) 


2. Use the information from Table 1.10 to create three statements that are equivalent to 
p <q other than (p > gq) A (q > p). 


3. For each of the following pairs of statements, use a truth table to determine whether 
they are logically equivalent, one is stronger than the other, or neither. 


TT this saying that all laughter is painful? 


1.3 Universal and Existential Quantifiers 


(a) ps PVG 

(b) pers (p> g) A(g> Fr) 

() PYVQrrnprnv@g-nr) 

(dd) PYQrrnprnAq-r) 

(ec) p> @|Ar) P>QAP>r) 

(f) pvgs (PV q) A774 

(g) p> @vr)s (pA-q)>r 

(h) (PAG) > Fr; (pAn7r) > 7q 

Gi) PCQODATSS) (PYT)e|Vs) 
Gj) Pog npg 
4. In Example 1.2.9, we noted that p — r is stronger than (p A q) — r, while p is 


weaker than p 4 q. This exercise investigates the reason why an implication statement 
is strengthened when its hypothesis conditions are weakened. 


(a) Suppose p and q are two statements, and p is stronger than qg. What must be true 
about the truth table entries for p as compared to those for q?° 
(b) If p is stronger than qg, why does that make p > r weaker than g > r?? 


(c) Which of the following pairs of statements is stronger? Explain without using a 
truth table. 


LL porn(pvyqg-r 
i. [DA(q>s)] >t; fpAl@Ar) > s}} ot 


1.3 Universal and Existential Quantifiers 


The if-then language of Section 1.2 is only one way to address whether the truth of 
one statement necessitates the truth of another. In this section, we analyze a language 
construct using words like ALL and SOME. These words are called quantifiers. It goes 
something like this. Consider the two statements: 


If x is a square, then x is a rectangle. 
All squares are rectangles. 


They say the same thing, but the latter does not seem to be a compound statement until 
we rephrase it in the form p > q. 


1.3.1 The universal quantifier 


Let’s take the story of Mr. S and construct a whole slew of statements, one for each day he 
went to the post office, and see if cloudy days were always associated with his wearing of 


8The set of F entries for one must be a proper subset of the F entries for the other. 
°Use your answer to part (a), and the definition of p > q. 
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rain gear. We’re not investigating whether he dressed this way on clear days; sometimes 
he did and sometimes he did not. Let’s put it together in the following way. 

Suppose there were n days that Mr. S hitched a ride to the post office. For each of 
these days, construct the following statements: 


pi: Day 1 was cloudy. 

qi: Mr. S wore rain gear on day 1. 

p2: Day 2 was cloudy. 

qo: Mr. S wore rain gear on day 2. (1.3) 


Pn: Dayn was cloudy. 


qn : Mr. S wore rain gear on day n. 


If we consider the statements — pz V gx (<> p —> q) for all 1 < k <n, we can link 
them together to form a huge compound statement that expresses the idea that for every 
cloudy day, Mr. S wore his rain gear. In the language of A and V, to say that Mr. S dressed 
for rain on every cloudy day could be expressed as 


(7P1 V qi) A (mp2 V G2) A (73 V 93) A+++ A (Pan V In) (1.4) 
or more succinctly as 


[Cpr v 0) = /[\@ > 4): (1.5) 


k=1 k=1 
The only way statement 1.5 can be true is if every py —> gx component is true. 
We introduce new language and symbols to facilitate such a statement. Consider the 
following: 


For every day that it was cloudy, Mr. S wore his rain gear; and 
for all k (1 < k <n) such that day k was cloudy, Mr. S wore his rain gear on day k. 


The expressions “for every” and “for all” are called universal quantifiers. The shorthand 
mathematical notation for this is V. 

Let’s formalize this further. Let € be the set of all k (1 < k <n) such that day k was 
cloudy, that is C = {k : 1 < k < nand dayk was cloudy}. Similarly, let R = {kK : 1 < 
k <nand Mr. S dressed for rain on day k}. Then we can reword our statement as 


ForeverykK EC, KER 
which we can write mathematically in either of the following ways: 


(Wk € C)(k ER); 
Why(keC>keR). 


The most general form for a statement involving the universal quantifer would look 
something like this. If P (x) issome property stated in terms of x, suchas “x € C> x ER, 


1.3 Universal and Existential Quantifiers 


then a general statement involving the universal quantifier would be written 


(Vx)(P(x)) (1.6) 


and would be read, “For all x, P(x).” As we said at the beginning of this section, using 
the universal quantifier is a fancy way of saying “If it was a cloudy day, then Mr. S dressed 
for rain.” Yes, but we'll see in Section 1.4 why we need to understand a statement with 
the universal quantifier as an A-linking of individual p, — qx statements. 


1.3.2 The existential quantifier 

EE 

Now let’s go back to our story and consider the clear days. One thing that made Mr. Shep- 
hard’s antics so amusing to me was that I seem to remember seeing him dressed for rain 
on a clear day, at least once, that is. We want to build a logical statement that expresses 
symbolically the thought that there existed at least one clear day when he dressed for 
rain. How can we consider each day, note whether it was clear and whether we found 
Mr. Shephard dressed for rain and then address the issue of whether this odd combination 
of circumstances happened at least once?!® 

The answer is 


n 


(spi Ag V p22 AQ) V+ V Pn A dn) = ope Ag): (7) 
k=1 


If even for one day both sp, and gq, are true, then Eq. (1.7) will be true. Using our 
previous notation, and letting $ be the set of all clear (sunny) days, we can express this 
in any of the following ways: 


There exists a day such that it was clear and Mr. S dressed for rain. 


There exists k € 8 such thatk € R. 


The expression “there exists,” written mathematically as J, is called the existential 
quantifier. Thus the existential statement could be written: 


(ak € 8)(k € R) (1.8) 
(Ak)y(kKESAK ER). (1.9) 


The most general form of an existential statement involving a property P(x) would 
be written 


(Ax)(P (x)) (1.10) 


and would be read, “There exists x such that P(x).” Note how a statement with the 
existential quantifier is a V-linking of many individual statements. 


'0The hint is in the sentence itself. Use V to join a bunch of pieces that use A. 
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Example 1.3.1. Formalize the following into logical statements involving V, 4, A, and v: 


1. Forallx € A,x >. 


2. There exists x € A such that either x < 2 orx > x”. 


3. There exists x € N such that, for all y e N,x < y. 


Solution: 

1. (Wx €A)\(X > 7). 

2. (Ax € A(x <a) V (x > a”)]. 
3. Gx EN)\Vy EN) < y). 


1.3.5 Unique existence 


Sometimes it’s important to know not only that something exists, but that exactly one 
such thing exists. To communicate that exactly one thing with a certain property exists, 
we say that it exists uniquely. The mathematical statement 


(Alx)(P(x)) (1.11) 


is read “There exists a unique x such that P(x).” 

How do we alter the statement (4x) (P (x)) to include the additional stipulation that 
there is no more than one such x?!! The standard way of defining unique existence is the 
following. 


Definition 1.3.2. The statement (A!x)(P(x)) is defined by 
[(Ax)(P(x))1 A (PQ) A P(x2)) > (1 = X2)]. (1.12) 


Thus, unique existence is really a compound AND statement. First of all, it means 
that some x exists with the property P(x), but also, if we assume that x; and x2 both 
have the property, then in reality x; and x2 are the same thing. 


Example 1.3.3. Reword the following statements of unique existence in a form like that 
in Definition 1.3.2. 


1. There exists a unique real number x such that x* = 8. 


2. For alla € R, there exists a unique b € R such thata + b = 0. 


Solution: 


1. There exists a real number x such that x? = 8, and if x, and x2 are real numbers 
such that x? = 8 and oe = 8, then x} = x. 


"Think about what would have to be true of x; and x2 if both P(x,) and P(x») are true. 


1.4 Negations of Statements 


2. For alla € R, there exists b € R such that a + b = 0, and if b, bo € R satisfy 
a+b, =Oanda+b. =0, then b; = bo. a 


EXERCISES 


1. Express the following statements in the language of universal and existential quanti- 
fiers. 
(a) Ifx € A, thenx ¢ B. 
(b) Everyone in the class is present. 
(c) No element of the set A exceeds m. 
(d) Ifn > 5, thenn? < 2”. 
(ec) ANB=8%. 
(f) The faculty resolution passed unanimously. 
(g) Only fools fall in love. 
(h) Dan has never made an A in a history class. 
(i) Every time a bell rings, an angel gets his wings. 


> _ x = O are nonnegative. 


(j) The only solutions to the equation x 
(k) The equation x? + 1 = 0 has no solution in R. 

(1) Ifn is an odd integer, then n? is an odd integer.'” 
(m) Ifnm > 3, then the equation x” + y” = z” has no integer solutions x, y, z € Z. 

2. Reword each unique existence statement that follows in a form like that of Defini- 

tion 1.3.2. 

(a) There exists a unique x € AN B. 

(b) There exists a unique x € R such that x > 1 and x isa factor of p. 


(c) The curves x7 + y* = 1 and 15y = (x — 11/5)? intersect at exactly one point 
in the xy-plane. 


1.4 Negations of Statements 


In Section 1.1 we made our first mention of the negation, or denial, of a statement. Given 
a statement p, the defining characteristic of its negation —p is that the truth table values 
are exactly the opposite. In this section, we see how to construct negations of compound 
statements. Then, if someone makes an ugly statement such as 


(PAq) > (VS) 


The terms even and odd have not been defined yet, though you undoubtedly are familiar with the 
terms. We'll define them precisely in Section 2.7. 
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and we want to add NOT to it, we will then have a way of writing 


“(pAqg)> (rvs)] 


in a more useful form. 


1.4.1 Negations of p /\ q and p V q 


In Exercise 4 in Section 1.1, you showed that —(p A q) is logically equivalent to —p V 7g, 
and that —(p V q) is logically equivalent to =p A —q. These facts are called DeMorgan’s 
laws, and they provide us a way of expressing the negations of A and V compound 
statements. 


Example 1.4.1. Construct a negation of the following statement: 
l pA(qAr) 
2. (PVG) AVS) 
Solution: Applying DeMorgan’s laws, we have 


1. “[pA@Ar)] & 7pPV7AqG Ar) & 7pV (avo) 


23a pVQMACVS]&APVQVArVs) & (Ap Arg) Vv (Fr Ans). 
|_| 


Notice that part 1 of Example 1.4.1 is the same as Exercise 4c from Section 1.1. In 
that exercise you used a truth table, but in Example 1.4.1, we used =(p Ag) & =p V 7q 
as a basis for manipulation of logical symbols. 


Example 1.4.2. Use DeMorgan’s laws to express in words a negation of the following 
statements: 


1. Jacob has brown hair and blue eyes. 
2. Christina will either fax or e-mail the information to you. 


3. Megan is at least 25 years old, she has a valid driver’s license, and she either has her 
own insurance or has purchased coverage from the car rental company. 


Solution: Don’t hesitate to word the statements in a way that makes them easy to 
understand: 


1. Either Jacob does not have brown hair, or he does not have blue eyes. 
2. Christina will not fax you the information, and she will not e-mail it to you either. 


3. Either Megan is under 25, or she doesn’t have a valid driver’s license, or she has 
no insurance of her own and has not purchased coverage from the car rental 
company. | 


1.4 Negations of Statements 


1.4.2 Negations of p — q 


In Section 1.2 we defined p — gq to be logically equivalent to =p Vv q. To construct a 
negation of p — q, we can use this fact with DeMorgan’s law: 


"(p> g) &7CPV ag) & (Tp) AWE & pAw~G. 


This might be a little confusing at first, but later you'll want to think of it in the following 
way. If someone makes a claim that p implies qg, then they are, in effect, claiming that the 
truth of p is always accompanied by the truth of qg. If you want to deny that claim, then 
your task will be to exhibit a situation where p is true and q is false. The techniques that 
follow for negating V and J statements will help clear that up. 


Example 1.4.3. Construct a negation of the following statements: 
l. p> (Ar) 
2.(p>qVv(r-s) 
Solution: 


lL -[p> GAN] Ss PATWQGAr) & PACEV7T) 


22(pregver> slerpr QA > 5) 3o (PAA) AT ATS) 
|_| 


Example 1.4.4. State in words a negation of the following statements: 
1. Ifit was a cloudy day, then Mr. S dressed for rain. 


2. If Megan rents a car, then she either has her own insurance or buys coverage from 
the car rental company. 


Solution: 
1. It was a cloudy day, and Mr. S did not dress for rain. 


2. Megan rented a car, but she neither has her own insurance, nor has she purchased 
coverage from the car rental company. a 


1.4.3 Negations of statements with V and 4 


Suppose someone makes the following claim: 
Every person in the class passed the first exam. 


For you to deny this claim, what sort of statement would you make?! You would probably 
say something like 


At least one person in the class failed the first exam. 


'3Qne slacker is all it takes. 
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We can formalize the language further. If we let C be the set of students in the class and P 
be the set of students in the class who passed the first exam, we can write the original 
statement as 


(Vx € C)(x € P) (1.13) 
and its negation would be 
(ax € C)(x ¢ P). (1.14) 
Using the property notation P(x), we can say the following: 
TVX) (P(x))] <> x) (FP (x). (1.15) 
Thus, the trick to negating a V statement is that the — symbol crawls over the V, and 
converts it to J as it goes. 
Example 1.4.5. State in words a negation of the following statements: 
1. Forallx € Z,x? > 0. 
2. For every x € R, either x is rational or x is irrational. 
3. For all x € Z, if x is divisible by 6, then x is divisible by 3. 
Solution: 
1. There exists x € Z such that x < 0. 
2. There exists x € R such that x is not rational and x is not irrational. 


3. There exists x € Z such that x is divisible by 6 but not divisible by 3. a 


Here’s another way to understand the denial of a statement involving V. In our 
example of Mr. Shephard’s dressing habits in Section 1.2, we defined statements p, and 
qk for 1 < k <n in the following way: 


pe: Day k was cloudy. 
qx: Mr. S wore rain gear on day k. 


Remember that the statement “For all k (1 < k <n), pp > qx” can be written as 
n n 
)\r> ae) or [\( PRY a6) (1.16) 
k=1 k=1 


as in Eq. (1.5). Applying an extended form of DeMorgan’s law to this, we would have 


= Ain met 0 | 7 [Acn Vv 0 | > \J apr V an) & \/ (Pe A 748): 
k=1 k=1 k=1 k=1 
(1.17) 


Nowif statement 1.16 is true, then every =p; V gx componentis true. Thus every py Aq 
in statement 1.17 is false, so that V7_, px A —9x is false. Conversely, if statement 1.16 is 
false, it’s because at least one — px V qx is false. Thus at least one py A gx in statement 1.17 


1.4 Negations of Statements 


is true, which is sufficient for Vi_, py A 7g, to be true. So statement 1.16 is true if and 
only if statement 1.17 is false, and they are indeed negations of each other. 
Now we negate statements involving 4. Consider the following: 


Someone traveling in my car didn’t chip in enough money for gas. 


If we let C be the set of people riding in my car and U be the set of underpayers, the 
statement becomes 


(Ax € C)(x € UV). 
How would the negation of this statement read?!* You could say 
Everyone traveling in my car chipped in enough money for gas, 
or 
(Vx €C)(x EU). 


Notice that the existential quantifier 4 became the universal quantifier V, and the defining 
characteristic x € U was negated. Thus we arrive at the following: 


[Gx) (Px) <> (Wx) (FP (x). (1.18) 


This looks similar to statement 1.15 in that negating a J statement can be done by letting 
the — symbol crawl over the J, changing it to V as it goes. 

If someone says to you that a certain something exists, and it is a claim you want 
to deny, then avoid using the expression “There does not exist.” Instead, express your 
negation in positive language by saying that, for every case, the claimed characteristic is 
not present. 

Example 1.4.6. Construct a negation of the following statements: 
1. (ax EN) <0). 


2. (Ve > 0)(an E N)(1/n < €). 
Solution: 


1. (Wx E N)(x > 0). 
2. (de > O)(Wn E N)(1/n = €). | 
Example 1.4.7. State in words a negation of the following statements: 
1. Someone in this class cheated on the final exam. 


For every x € A, it is true that x € B. 


There exists x € N such that, for all ye N,x < y. 


pas ee tS 


For every integer x, either x is even or x is odd. 


4 avoid saying “no one” if possible. It’s better to say what everyone did instead of what no one did. 
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Solution: 
1. Everyone in this class did not cheat on the final exam. 
2. There exists x € A such that x ¢ B. 
3. For every x € N, there exists y € N such that x > y. 
4 


. There exists x € Z such that x is not even and x is not odd. a 


One final note. Statements with the universal quantifier V can often be written in 
if-then form, and vice versa. Which form is preferred will depend on the context we’re 
working in. The point to be made here is that negating an if-then statement will produce an 
existence statement, so you might want to construct the negation of an if-then statement 
by first wording it in terms that use the universal quantifier. 

Example 1.4.8. The statement 
If it is a cloudy day, then Mr. S dresses for rain, 
is the same as 
For every cloudy day, Mr. S dresses for rain. 
Similarly the statement 
For every x in the class, x passed the first exam, 
is the same as 


If x is in the class, then x passed the first exam. 


Example 1.4.9. State in words a negation of the following statements: 
1. If—1 <x < 10, then x” < 100. 
2. There exists M € R such that, for all x € S, we have that x < M. 
3. For alla € A ande > O, there exists 6 > 0 such that, if0 < |x —a| < 6, then 
| f(x) —L| <e. 
Solution: 
1. There exists x such that —1 < x < 10 and x? > 100. 
2. For all M € R, there exists x € S such that x > M. 


3. There existsa € A ande > 0 such that, for all 5 > 0, there exists an x satisfying 
0 < |x —a| < dand|f(x) -—L| >. a 


EXERCISES 


1. Construct a negation of each of the following statements: 


(a) pV (@Ar) 


(b) (pvq)>r 
(c) p> @qvr) 
(d) pq 


. Construct a negation of each of the following statements: 


(a) Ifx € A, thenx ¢ B. 
(b) Everyone in the class is present. 
(c) No element of the set A exceeds m. 
(d) Ifn > 5, thenn? < 2”. 
(e) ANB=4@. 
(f) The faculty resolution passed unanimously. 
(g) Only fools fall in love. 
(h) Dan has never made an A in a history class. 
(i) Every time a bell rings, an angel gets his wings. 
(j) The only solutions to the equation x? 
(k) It only hurts when I laugh. 
(1) Ifn is an odd integer, then n? is an odd integer. 
(m) The equation x? + 1 = 0 has no solution in R. 


(n) Every integer is either even or odd, but not both. 


1.4 Negations of Statements 


— x = Oare nonnegative. 


(o) Ifm > 3, then the equation x” + y” = z” has no integer solutions x, y, z € Z. 


(p) The equation x? = 10 has a real number solution. 


(q) There exist M,, Mz € R such that, for all x € S, My <x < Mo. 


(r) There exists a unique x € AN B. 
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It’s time to begin applying the language and logic of Chapter 1 to proof writing. A good 
place to do this is with sets. In this section, we address two things. First, we return to the set 
terminology from Chapter 0, and we use it to practice some of the concepts we’ve learned 
so far. Second, we get our feet wet by beginning to write proofs. Right off the bat, we'll 
see three useful techniques for writing proofs: direct proofs; proofs by contrapositive; 
and proofs by contradiction. 


2.1.1 Terms involving sets 

Si 

Define the word definition. The self-referencing nature of this request might make is seem 
like a hard thing to do, but surely we would agree that a basic feature of a definition is 
that it gives us a way of substituting a single word for a whole phrase. The single word 
being defined is created for the express purpose of being equivalent to that phrase. You 
could say: 


Jessica went to Kenya and contracted one of a group of diseases, usually intermittent or 
remittent, characterized by attacks of chills, fever, and sweating, caused by a parasitic 
protozoan, which is transferred to the human bloodstream by a mosquito of the genus 
Anopheles and which occupies and destroys red blood cells. 


Or you could say simply that Jessica went to Kenya and contracted malaria. Malaria is 
defined to be that disease. You can get by without the word malaria if you want, but then 
you have to use a whole lot of words every time you want to address this disease (which 
we hope is not very often), and you won’t get practice in logic. For, you see, a definition 
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is an axiomatic statement of logical equivalence. In this case, we’re declaring that the 
statement 


x is malaria <> x is a disease characterized by... 


is a tautology. So, if x is malaria, then x is a disease with those scary symptoms, and if x 
is a disease with those scary symptoms, then x is malaria. 

In Section 0.1 we mentioned some terminology relating to sets. Let’s return to these 
definitions to bring a bit more sophistication to them, and present some new ones. 


Definition 2.1.1. Suppose A and B are sets. We say that A is a subset of B written A C 
B, provided the statement “if x € A, thenx € B” is true. That is, 


(ACB) Swe A>xeB) Ss (Vx EC AX EB). (2.1) 


We will not always be so wordy in our definitions, but it’s helpful to be very thorough 
at first. 


Definition 2.1.2. If Aand B are sets, we say that A = B provided A C Band BC A. 
That is, 


A=BS(ACB)A(BCA) 
S&(xeArxeBAxKEB>xXx EA). (2.2) 


Definition 2.1.3. If A and B are sets, we define the sets 


AUB= {x:xEAVxe B} (2.3) 
ANB= {x:xE€AAxe B}. (2.4) 

That is, 
xE€AUBS(XKEAVXEB) (2.5) 
xEANBS(XEAAXEB). (2.6) 


Definition 2.1.4. If AM B = Q, then A and B are said to be disjoint. 


To say AM B = @is to say that there does not exist any x € AN B, or, if you prefer, 
there does not exist x € A such that x € B also. That is, 


ANB=6 <8 7@Gx € ANB) 
<< -(dx € A)(x € B) 
= (Vx € A)(x ¢ B) 
SxEeArxE€B. 


(2.7) 


Notice that the disjointness of A and B is completely summed up by the last line of 
statement (2.7). It isn’t necessary also to include x ¢ B — x ¢ A, because this is merely 
the contrapositive ofx ¢ A> x ¢ B. 
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U 


Figure 2.1 Shaded region represents A\ B. 


U 


Figure 2.2 Shaded region represents AAB. 


We mention two other useful sets to construct from two sets A and B. The difference 
A\B is the set of all elements in A that are not in B, and the symmetric difference 
AAB is the set of elements that are either in A or B, but not both. Venn diagrams of 
these are sketched in Figs. 2.1 and 2.2, respectively. Before you read the definitions of 
A\B and AAB that follow, see if you can construct them yourself using N and U and’ 
(complement). 


Definition 2.1.5. If A and B are sets, we define the difference of A and Bas 
A\B={x:xe€Aandx ¢ B}=ANB’. 


Definition 2.1.6. If A and B are sets, we define the symmetric difference of A and B as 
AAB ={x:xE€ AN B'orx € ANB} =(ANB)U(A'NB). 


Let’s practice some statement denials. See Exercise 1 for more similar examples. 
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Example 2.1.7. Use the preceding definitions to construct a negation of the following 
statements: 


1ACB 
2,.xE ANB 
Solution: 


1. The statement A C B means that, for allx € A,x € B. Thus A Z B means that 
there exists x € A such that x ¢ B. 


2. The statement x € AM B means that x € A and x € B. Thus by DeMorgan’s 
Law, x ¢ AM B means either x ¢ Aorx ¢ B. | 


2.1.2 Direct proofs 

Ss 

Now let’s begin to write some proofs of theorems. In general, a theorem is a statement 
of the form p — q. The hypothesis p will likely be a compound statement, and some of 
its components might not even be explicitly stated. At the foundational level, our task in 
writing the proof of a theorem is to show that p — q is a tautology. However, the final 
product that we call a proof will be prose, and not look anything like the manipulation of 
strings of symbols as we did in Chapter 1. In our first examples, we employ the method 
of direct proof. The statement and proof of a theorem proved directly will always look 
something like this. 


Theorem 2.1.8 (Sample). If p, then q. 
Proof: Suppose p. Then, .... Thus q. a 


Writing a proof of the theorem is merely the construction of a logical bridge from the 
hypothesis to the conclusion. In the example theorems that follow, we will dissect every 
statement and present every detail as completely as possible before we present a proof. 
This will help us make sure we understand all the logical internals. We'll then write the 
proof as a cleaned-up presentation of the argument for p — q. As we get more practice, 
we'll concentrate more on the actual writing of the proof, leaving the logical skeleton 
concealed. For example, in Theorems 2.1.9 to 2.1.11, we use the following statement 
definitions: 


p:xeEa (2.8) 


q: xeEB. (2.9) 


Theorem 2.1.9. IfA C B, thenANB=A. 
Here are some preliminary thoughts before we present a proof: 


1. In some set proofs, you might want to draw a Venn diagram to convince yourself 
that the theorem is true. If the Venn diagram is drawn so that A is sketched to lie 
completely inside B, then the result is apparent. However, the Venn diagram does 
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not constitute a proof. We have a long way to go before we earn the right to say “Proof 
by picture.” 


2. How do we state the hypothesis condition of Theorem 2.1.9 in terms of p and q in 
the preceding?! Also, how do we state the conclusion AN B = A in the same terms?” 
The answers: 


hypothesis conclusion 
___ SSS 999, 
(P>qQ—> [Prg) > PIAlp> (PAQ). (2.10) 


Take a moment to construct a truth table for statement (2.10) (see Exercise 2) and 
you'll see that it is a tautology. Arguably, this exercise constitutes a proof, but we want 
more than a proof in the formal language of logic. We want well-written prose. 


3. The statement that we’re trying to prove in this theorem is an equation involving sets: 
ANB =A. This set equality is itself a compound statement about subset inclusion: 
ANB C AandA C ANB. Thus writing our proof will be a two-step task. Whenever 
we show that one set is a subset of another, say AM B C A, we’re proving an if-then 
statement that says if x is an element of one set (AM B), then x is an element of 
another set (A). Writing a subset inclusion proof of this sort would therefore read 
something like “Suppose x € AM B. Then .... Thus x € A.” It’s common to see 
proofs of this sort written using an expression such as “Pick x € AN B,” which has 
the imagery of reaching into AM B and grabbing an arbitrary element, which we 
call x. The only thing we know about x is that it’s an element of the set where we 
picked it, but somehow we must be able to conclude that x is an element of A, the 
set on the other side of the C symbol. Proofs of this sort, which Theorems 2.1.9 and 
2.1.10 will illustrate, are often called element-chasing proofs. They show that two sets 
are equal by chasing an arbitrarily chosen element from one side to the other, then 
back. 


4, A proof is like a road map. The person sketching the map will provide more or less 
detail depending on the traveler’s familiarity with the territory. Because we are just 
getting started with proofs and want to write clearly and completely, our proofs (es- 
pecially at first) will often contain the inserted statement “We need to show that... ,” 
so that the reader can always see where we’re going. Even for the most sophisticated 
mathematical audience, it’s not uncommon to have such a phrase included in proofs 
that are lengthy or complicated. You might want to make occasional use of this phrase 
at first. 


Proof: Suppose A C B. To show AN B = A, we must apply Definition 2.1.2 and 
show that AN BC AandACANB. 


(AN BC A): Pickx € AN B. Then x € A and. € B. Since x € A, we have 
that AN BCA. 


(A C ANB): Pick x € A. Then, because A C B, we have that x € B. 
Therefore, since x € A and x € B, it follows thatx € AN B. Thus A CAN B. 


'See Definition 2.1.1. 
2See Definitions 2.1.2 and 2.1.3. 
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Since we have shown that AM B C A and A C AN B, we have demonstrated that 
ANB=A. |] 


The proof of Theorem 2.1.9 is painfully but intentionally wordy so that you can see 
how its structure derives from the definitions of the terms it uses. We'll quickly dispense 
with the frequent use of such phrases as “We need to show,” provided the previous 
definitions and theorems should make it obvious what we need to say next. Assuming 
the reader knows what is to be shown, we have the freedom to be more succinct. Thus a 
proof of Theorem 2.1.9 would read much better if written in the following way. 


Proof (Succinct): Suppose A © B, and pick x € AM B. Then x € A, so that 
ANB C A. Now pick x € A. Then since A C B, it is also true that x € B, so that 
x €ANB.ThusA CAN B,sothatANB=A. | 


Theorem 2.1.10 (DeMorgan's Law). [fA and B are sets, then 


(AN BY = A’ UB". (2.11) 


Here are some preliminary thoughts about the logical structure of the theorem. 


1. The apparent if-then structure of the theorem includes the hypothesis condition “If 
A and B are sets.” You might argue that the theorem could have been more simply 
stated as (AM B)’ = A’ U B’, if the reader only knew that the context of the theorem 
is sets. Fair enough. In logically analyzing this theorem, let’s strip away the phrase “If 
A and B are sets,” and let the theorem simply say (AM B)’ = A’ U B’. By peeling 
the phrase away, we have revealed an if-and-only-if statement: x € (AN B)’ iff 
xE AUB’. 

2. Using p and q as defined in statement (2.8), this theorem looks a lot like DeMorgan’s 
law for statements: 


a(p Aq) @ (7p V 79). (2.12) 


In fact, restating Theorem 2.1.10 in terms of the p and q in statement (2.8), The- 
orem 2.1.10 becomes statement (2.12). In Exercise 4 from Section 1.1, you showed 
that statement (2.12) is a tautology. Thus, in a sense, Theorem 2.1.10 is proved. Still, 
we write it out in prose. 


3. The proof of this theorem will reveal a fork in the road. There will come a point 
where we are given two possible cases to consider. We must follow each case to the 
end to show that they both will allow us to arrive at the conclusion. 


Proof: We show that (AM B)’ C A’ U B’ and (AN BY’ D A'UB’. 


(CS): Pick x € (AN B)’. Then x ¢ AN B, so that either x ¢ A or x ¢ B. We 
consider each case. 


(Case x ¢ A): Ifx ¢ A, then x € A’. Thus x € A’U B’. 
(Case x ¢ B): Ifx ¢ B, then x € B’. Thusx € A’U B’. 
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In either case, we have that x € A’ U B’, so that (AM B)’ C A’U B’. 
(D>): Pick x € A’ U B’. Then either x € A’ or x € B’. We consider each case. 
(Case x € A’): Ifx € A’, then x ¢ A. But ifx ¢ A, then certainly 
x € AQ B.Thusx € (ANBY. 
(Case x € B’): Ifx € B’, thenx ¢ B. Butifx ¢ B, then certainly 
x € AN B.Thusx € (ANBY. 
In either case, we have that x € (AM B)’, so that A’U B’ C (AN BY’. 


Since we have shown (AN B)’ C A’U B’ and (AN B)' D A’ U B’, we have that 
(AN BY = A'UB’. | 


The fact that the empty set contains no elements can make for some interesting twists 
in proofs. 


Theorem 2.1.11. For any set A, AUD=A. 
Proof: Weshow AU® C Aand AUD A. 


(C): Pick x € A UQ. Then either x € A, or x € J. But since @ contains no 
elements, it must be that x € A. Thus AU@C A. 


(>): Clearly, ifx € A, thenx € AUY. Thus A C AUS. 
Therefore, AU@= A. | 


2.1.3 Proofs by contrapositive 

[en 

In Section 1.2.2, we showed that p — q is logically equivalent to—g — —p. This suggests 
that a proof of a theorem of the form p — gq might be approached contrapositively by 
showing ~q — —p instead. To prove the latter is equivalent to proving the former. In 
general, a proof by contraposition will go something like this. 


Theorem 2.1.12 (Sample). If p, thenq. 
Proof: Suppose —q. Then, .... Thus sp. a 


Whether it’s better to attack a theorem directly or contrapositively comes with ex- 
perience. In proving the following theorem contrapositively, we illustrate how natural it 
can be to show as a conclusion that something is nonempty, rather than supposing it is 
empty as a hypothesis condition. To see the reasonableness of Theorem 2.1.13, draw a 
Venn diagram first. 


Theorem 2.1.13. IfA C B, then A\B = 9. 


Proof: Suppose A\B #4 @%. Then there exists x € (A\B). Thus x € AN B’, so 
that x € A and x é€ B’. But the existence of such an x is precisely the denial of 
Definition 2.1.1, so that A Z B. | 
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2.1.4 Proofs by contradiction 

Le 

Both direct proofs and contrapositive proofs are effectively an effort to show that p > q 
is a tautology. Sometimes, however, it’s easier to prove p > q is a tautology by showing 
—(p — q) isacontradiction. In order to construct a proof by contradiction, we first have 
to recall the negation of p — q. It’s also helpful to remember that an if-then statement 
might sometimes be better stated using the universal quantifier V. So, what is another 
way to write —(p — q)? Answer: p A 7q. Writing a proof by contradiction consists of 
showing that p A —gq is impossible. The following example theorems will all take this 
form: 


Theorem 2.1.14 (Sample). If p, thengq. 


Proof: Suppose p and —q. Then .... This is a contradiction. Thus p > q. a 


Theorem 2.1.15. IfA is any set,0 C A. 


Proof: Suppose there exists a set A such that @ Z A. Then there exists x € @ such 
that x ¢ A. But % contains no elements. This contradicts the definition of @. Thus 
OCA. a 


2.1.5 Disproving a statement 

a 

Not only is it important to be able to prove the truth of statements, we sometimes find 
ourselves needing to demonstrate that a certain statement is not true. To disprove a 
statement is simply to prove its negation. Here’s an example of a statement that at first 
glance you might think is true but is not. Think about it before you check the solution, 
and see if you can demonstrate on your own that it is false. 


Example 2.1.16. If A and B are sets such that A C B, then A and B are not disjoint. 
Solution: Logically this false statements says 
(VA)(VB)[(A C B) > (ANB FB)). (2.13) 
What is the negation of this statement? It is 
(AA)(AB)[(A C B) A (AN B=9)). (2.14) 


Disproving statement (2.13) is the same as proving statement (2.14), which means 
our task is to demonstrate the existence of disjoint sets A and B, where A C B. We 
must create them. So how about letting A = 4 and B = {1}. Clearly, A C B and 
ANB=%. a 


Example 2.1.16 illustrates a very common occurrence in mathematics. Sometimes 
we might be tempted to believe that a certain statement is true, when in actuality it is not. 
If such a statement is a universal one as in Example 2.1.16, then to disprove the statement 
involves demonstrating the existence of what is called a counterexample. 
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EXERCISES 


1. Construct the negation of each statement below, as was done in Example 2.1.7. 
(a) x € AUB 
(b) A and B are disjoint. 
(c) A=B 
(d) x € A\B 
(e) x € AAB 
(f) AC BUC 
(g) AUBCCND 
(h) IfAUC C BUC,thenA C B. 


2. Construct a truth table for statement (2.10) on page 42 to verify it is a tautology. 


3. Show that the converse of statement (2.10) is also a tautology. State the converse of 
Theorem 2.1.9 and prove it. 


4. Prove the following. 


(a) IfA C B, thn AUB=B. 

(b) IfAC BandBCC,thenA CC. 

(c) If A C B, then A’ D B’. 

(d) (DeMorgan’s Law) (A U B)’ = A’ B’. 
(e) fANB=9,then AAB=AUB. 


5. Suppose A, B, and C are sets. Consider the following statements: 


A=BeAUC=BUC (2.15) 
AS EBSANCZLBAC. (2.16) 


For each of Eqs. (2.15) and (2.16), one direction of the implication + is true and 
one is false. Prove the direction that is true, and provide a counterexample for the 
direction that is false. 


6. Prove that M distributes over U and vice versa. 


7. If X and Y are disjoint sets, we sometimes write X U Y as X UY. This is a way of 
talking about the set X U Y by tagging it with a little symbol (the dot) that tells the 
reader the additional information that X and Y are disjoint. So if someone makes a 
statement like 


AUB=AU(B\A) (2.17) 
what he is really saying is the compound statement 
AUB=AU(B\A) and AN(B\A)=¥9. (2.18) 


Prove Eq. (2.17) by showing both parts of Eq. (2.18). 


2.2 Indexed Families of Sets 


2.2 Indexed Families of Sets 


If youre working with no more than three sets at a time as we did in Section 2.1, it’s 
probably sufficient to use A, B, and C to represent them. If you have a set of, say, 10 sets 
(generally called a family or collection of sets instead of a set of sets), it might be more 
sensible to put them into a family and address them as Aj, Az, ..., Aig. In a case like 
this, we would say that the set {1, 2, 3,..., 10} indexes the family of sets. If we write 


N, = {1, 2,3,..., 7}, 
then we could write a family of n sets as 
{A1, Az, A3,..-, An} = {Ack € Nn} = {Anhiar, 


and we would say that N,, is an index set for the family of sets. This notation has 
advantages, for then we could write unions and intersections more succinctly: 


A;UA,UA3U---U An = |) Ax (2.19) 
k=1 

AN AN AZN-+- NA, =) Ak. (2.20) 
k=1 


In Definitions 2.2.3 and 2.2.4, we'll define precisely what we mean by this sort of union 
and intersection. 

We can go even further. It is conceivable we might need to work with infinitely many 
sets {A,, Az, A3,...} that we might want to index with N. For example, if we use the 
familiar interval notation 


[a,b] = {x : x € Randa < x < dD}, 


we might talk about the family of intervals ¥ = {An}nen, where A, = [0, 1/n]. To form 
the union or intersection of a family of sets indexed by N, we could use notation like that 
in the preceding: 


LJ An and () An. (2.21) 
or we could write something like 


LJ An and () An, (2.22) 


neN neN 


where by Eq. (2.22) we understand that n is allowed to take on all values of the indexing 
set N. 

The notation in Eq. (2.22) is handy when the indexing set is more complicated than 
N and does not allow us to think of some index variable n starting at 1 and progressing 
sequentially off to infinity. For it’s conceivable that any set A can index a family of sets. 
We can then address individual sets in the family as Ay, where a € A, and denote the 
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family FJ = {Ay}ye4- Union and intersection could then be written as 


U Ag and () Ag. 


acA acA 


Example 2.2.1. One important contribution of the German mathematician Richard 
Dedekind (1831-1916) is a rigorous foundation of the set of real numbers. He employed 
what is now called a Dedekind cut, whereby the set of rational numbers is “cut” into two 
pieces. Specifically, for some r € R, Dedekind worked with sets of the form 


A, = {x : x € Qandx <r}, (2.23) 
B, = {x :x € Qandx > r}. (2.24) 


For example, although V2 is irrational (as you'll see in Theorem 2.8.3), it certainly 
makes sense to talk about A VD the set of all rational numbers less than V2. The set R is 
the index set for the families {A,},cr and {B,},er. 


Example 2.2.2. Consider the family of intervals 
F = {[-r,r]:r € R*}, (2.25) 


where R* denotes the positive real numbers. The index set is R*, and we might want to 
use the notation J, = [—r, r] to represent one interval in the family. We might also write 
F = {I }rert- 

Of course, we don't have to worry about having our family of sets so well organized 
as to be indexed by any set at all. In the same way that we talk about a set A and address an 
arbitrary element x € A, we can call our family of sets F and address an arbitrary set in 
the family as A € F. Then the union and intersection of the sets in F could be written as 


U A and ‘a A, (2.26) 


Ace Act 


or more simply as 
U A and () A. (2.27) 
F F 


The statements in Eqs. (2.26) and (2.27) assume the least structure on the family of sets, 
so they are a good general notation. 

Before we really start digging into families of sets, we're going to set up a particular 
example, a sort of scenario really, that will help carry us through the complexities of 
the ideas and notation. Families of sets tend to confuse students at first, and for several 
reasons. First, when we talk about a family, we’re actually talking about a set whose 
elements are sets. Thus it might be that x € A and A ¢€ J, but writing x € F is wrong, 
for x is not an element of F, but an element of an element of F. Second, when a family 
is indexed, the indexing set A is yet another set to contend with, a set whose elements a 
serve as something like name tags by which sets in the family are addressed. Therefore, 
to try to make this as clear as we can, we’re going to stage a little production whose cast 
of characters all represent the set family terms. Then, as we state theorems, the scenario 
will help us understand what’s being said and how to approach the proof. 


2.2 Indexed Families of Sets 


Let’s suppose we're studying the mathematics section of the catalog of Prestigious 
University. Every mathematics class has a number, so let 


A = {104, 210, 211, 212, 330, 360, 430, 431, 510, 531}, (2.28) 


and think of A as the set of all the course numbers of mathematics courses offered at PU. 
This set of course numbers A will be our index set. We'll address an arbitrarily chosen 
course number as a € A. Let Ag be the roster of all PU graduates who passed course 
number @ while they were students at PU. We'll address an arbitrarily chosen student on 
roster Ay as x. With these role assignments, F = {Aq}wea is the set of all these rosters of 
students who passed the individual courses. That is, 


F = {Ajo4, Azio, Azit,.--, Asai}. (2.29) 


Writing the family of rosters as ¥ = {Ag}wea references each roster in the family in terms 
of the course number that tags it. Writing F simply as {A} dispenses with the number 
tags and addresses a particular roster in F simply as A. 

Certainly the family of sets F and its indexing set A need not look anything like the 
ones we've created here. But thinking of A as in Eq. (2.28) and F as in Eq. (2.29) should 
not limit us or make our proofs less than generally applicable if we use the analogy as a 
tool to help clarify our thinking. Just remember that we'll address sets in the family in 
one of two ways. To illustrate the first way, sometimes we’ll pick some arbitrary a € A 
in order to talk about Aj, which is like choosing an arbitrary course number and talking 
about the roster for the course with that number. Or perhaps we'll claim the existence 
of some specific wp € A in order to talk about Ag,. In our analogy, this is the claim that 
there is a certain course number whose roster has some property. To illustrate the second 
way of addressing sets in the family, sometimes we might pick an arbitrary A € F, which 
is like choosing an arbitrary mathematics roster without any specific reference to a course 
number. Or perhaps we’ll claim the existence of some specific Ag € F. This is the claim 
that there is a particular mathematics course roster that has a certain property, without 
making any reference to that course’s number. 

Having created this little scenario as an aid to our understanding, let’s define more 
terms and derive some results, using the scenario to motivate and get us over some humps. 
First, how should we define the sets UaegA and N4cxA? Or, equivalently, how should 
we define the following statements? 


xe OE and xeE [4 (2.30) 


In effect, we want U,cgA to be the set you get when you take each A in F and dump 
all its elements into a single set. For our scenario, UaegA, or Uvea Aa, however you 
choose to write it, is the single roster of graduates created by unioning all the individual 
mathematics class rosters. Maybe you can see that UA consists of all graduates who have 
ever passed a mathematics class at PU. So if x represents a graduate, saying x € UsA means 
that there is some mathematics course at PU that x passed. We can write this in two ways: 


xe(JAexe|JAro GAe HWE A) (2.31) 
AEF acA 


> a € A)(x € Ay). (2.32) 


3Think in terms of J for the union and Y for the intersection. 
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The form of Eq. (2.31) says simply that there is a mathematics course roster on which 
the name of student x appears, while Eq. (2.32) says there exists a course number a € A 
such that x is on the roster of mathematics course number a. With this, we arrive at the 
following definition. 


Definition 2.2.3. Let F bea family of sets. Then the union over F is defined by 


LJ A= (x: GA Fe Ad}. (2.33) 
AEF 
If Fis indexed by A, this becomes 
LJ A= (J Ae = {x : Ga € Ale € Aq)}. (2.34) 
AEF acA 


Now what does Ny4exA (or Nve4Aa) correspond to in our scenario? This is the 
intersection of all rosters, so it’s the list of all graduates who passed all mathematics 
courses at PU. So what does it mean mathematically to say x € NyA? 


xe()Aexe[] Aa o WA€ T(x € A) (2.35) 
AcF acl 


& (Va € A)(x € Ag). (2.36) 


With this we arrive at the following definition. 


Definition 2.2.4. Let F be a family of sets. Then the intersection over F is defined by 


()A={x: (WA F(x Ad}. (2.37) 
AEF 
If Fis indexed by A, this becomes 
(A= ] Aa = {x : (Wa € A)(x € AQ)}. (2.38) 
AEF acA 


A lot of the results we proved in Section 2.1 for a family of two or three sets carry 
over to analogous results for a family of sets of any size. Here are some examples and 
theorems. Do yourself a favor and try to work them on your own first. 


Example 2.2.5. Construct the negation of the statement: x € UsA. 


Solution: Looking at Eq. (2.32), the statement x ¢ UsA means 
(VA € F(x ¢ A). 
If F is indexed by A, we use Eq. (2.32) to have 
(Va € A\(x ¢ Ay). 2 


In our mathematics class scenario, what does x ¢ UsA mean? Graduate x is not on 
the universal roster of mathematics classes, so x never passed a mathematics class at PU. 


2.2 Indexed Families of Sets 


That is, for every roster A € F, x is not on roster A, or, for every course number a € A, 
x did not pass the mathematics course numbered a. 


Theorem 2.2.6 (DeMorgan's Law). Suppose F is a family of sets. Then 


Ja} =( 4’ (2.39) 
F F 
Proof: We prove by mutual subset inclusion. 


(C): Pick x € [UsA]’. Then x ¢ UsA. Therefore, for all A € F,x ¢ A. But 
then x € A’ for every A € F. Thus, x € NA’. 


(2): Pick x € NyA’. Then x € A’ for every A € F. Thus, x ¢ A for every 
A &€ §, so that x ¢ UsA. Therefore, x € [UsA]’. a 


To see what Theorem 2.2.6 says in our scenario, we need a universal set U in which A’ 
is meaningful. Let U be the roster of all PU graduates. With that, what set is being talked 
about in Theorem 2.2.6, and what are the two ways of constructing it in Eq. (2.39)? To 
construct the left-hand side of Eq. (2.39), we first combine all the mathematics rosters 
into a single roster, then take all the PU graduates except these. This is the list of all 
PU graduates who avoided mathematics altogether while they were at PU. How do we 
construct the right-hand side of Eq. (2.39)? First, we take each mathematics class roster 
and consider the complement. For example, A‘, , is the list of all PU graduates who did 
not pass Mathematics 211. By taking the intersection of all these complements, we arrive 
at the list of PU graduates who avoided Math 104 and Math 210 and... and Math 531; 
that is, the list of graduates who avoided mathematics altogether. 

You'll prove the other form of DeMorgan’s law in the following in Exercise 2. 


Theorem 2.2.7. Suppose F is a family of sets. Then 


/ 
() A| = a A’. (2.40) 
F F 
For the next theorem, we’ll prove part 2 here and leave part 1 to you in Exercise 3. 


Theorem 2.2.8. Let F be a family of sets indexed by A, and suppose B C A. Then, 
1. UpewAg © UnenAa> 
2. NpenAg 2 NacAAa- 


Before we prove part 2 of Theorem 2.2.8, let us see how it relates to our scenario. 
Since A is the set of all mathematics course numbers at PU, it might work to think of B 
as the set of all course numbers of lower-level mathematics courses at PU. Thus 


B = (104, 210, 211, 212}. (2.41) 


With that, what does part 2 of Theorem 2.2.8 say? The set construction on the left- 
hand side is the intersection of the class rosters across all the lower-level courses, while 
the right-hand side is the intersection of the rosters of all the mathematics classes. 
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Therefore, if we pick some x € Nye4Aq, then x is a PU graduate who passed every 
mathematics course offered. Thus, certainly x passed all the lower-level courses. Here’s 
the proof. 


Proof: Pick x € Nve4Aq. We must show that x € Ag for all B € B, so pick B € B. 
Since B C A, it follows that 8 € A.Therefore,x € Ag, becausex € A, foranya € A. 
Since 6 was chosen arbitrarily, we have shown that x € Ag for all 6 € B, so that 
x € MpewAg. | 


Notice how we showed x € Mgex Ag by picking an arbitrary 6 € B and showing that 
x € Ag. This shows that x € Ag for all B € B, so that x € NgepAg. 

We can write Theorem 2.2.8 in a slightly different form if the family of sets is not 
indexed. If F; is a family of sets and F2 C Fy, we call F2 a subfamily of F;. Since 
B C A in Theorem 2.2.8, {Ag}gegz is a subfamily of {Ag}ve4. Swapping the notation in 
Theorem 2.2.8 for an arbitrary family F; and a subfamily F2, we have the following. 


Theorem 2.2.9. Suppose F; is a family of sets, and Fz is a subfamily of F,. Then, 
1. Ug, A ie Ug, A, 
2. Ng,A DMg,A. 


The next theorem involves two families of sets, both indexed by A, where corre- 
sponding sets in the two families are related by subset inclusion. To understand what it’s 
saying, think of By as the set of all male PU graduates who passed mathematics course 
number a. You'll prove both parts in Exercise 4. 


Theorem 2.2.10. Suppose F; = {Ag}uea and F2 = {By}uea are two families of sets with 
the property that By © Ag for everya € A. Then 


1. Usea Bu S sea Ag 
2. Pex Bu S aren Aa 


EXERCISES 


1. Construct the negation of the statement x € N¢A. 


2. Prove Theorem 2.2.7: Suppose F is a family of sets. Then 


In 4| = U A’. (2.42) 

F F 

3. Prove part 1 of Theorem 2.2.8: Let F be a family of sets indexed by A, and suppose 
B Cc A. Then, UpgepAg Cc User Aa: 


4. Prove Theorem 2.2.10: Suppose F; = {Aw}wea and F2 = {Bu}wea are two families of 
sets with the property that B, C Ay for every a € A. Then 


2.3 Algebraic and Ordering Properties of IR 


(a) ees Bu S Leen Aq 
(b) (lesa By € (isa Aq 


5. Suppose FJ = {A} isa family of sets, and suppose C isa set for which A C C for every 
A € ¥. Show that UsA C C4 


6. Suppose F = {A} is a family of sets, and suppose D isa set for which D C A for every 
A € ¥. Show that D C N3zAS 


2.3 Algebraic and Ordering Properties of IR 


In this section we turn our attention to proofs of basic algebraic and ordering prop- 
erties of the real numbers. You might want to take a look at assumptions Al—A22 in 
Chapter 0 before you continue reading. It is only these properties of R that we can 
use for free. Anything else, we will have to justify from these assumptions. In this sec- 
tion, we'll make use of assumptions Al—A18 to derive basic and familiar algebraic and 
ordering properties of the real numbers. The theorems and examples that follow are 
designed to do two things. First, they will give you a feel for how to write proofs of 
this sort. The proofs that follow may be a little wordy, but the excessive explanation 
will probably help you as you begin to write these kinds of proofs. Second, they will 
serve as assumptions for the exercises and theorems from later sections. Once a the- 
orem is proved, then you are certainly free to use it later, with a proper reference, of 
course. 


2.3.1 Basic algebraic properties of real numbers 
Theorem 2.3.1 (Cancellation of addition). For alla, b,c € R, ifa+c=b+c, then 
a=b. 


One of the most basic of the algebraic properties of IR, we would like to know that 
we are free to take a given equation a + c = b +c, and, as it were, cancel out the +c 
from both sides. Notice the theorem is a statement of the form p — q, where p is the 
statement a +c =b+c, and q is the statement a = b. 

We're going to present two proofs of Theorem 2.3.1, written in two somewhat 
different styles. The first proof reveals the thought process involved in constructing 
the proof, but uses several disjoint equations. The second shows how you might clean up 
these equations to create one extended equation that is the result we want. 


Proof 1: Suppose a + c = b+ c. By property A7, there exists —c € R such that 
c+ (—c) = 0. Since addition is well-defined (A2), we have that a + c + (—c) = 
b+c+(—c), which yieldsa +0 =b+0,ora=b. | 


“Think of C as the set of all PU graduates who ever enrolled in a mathematics class. 
>Think of D as the set of all PU mathematics majors (whom we'll assume would have taken every 
mathematics course) who graduated with a 4.00 GPA. 
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Proof 2: Suppose a + c = b+ c. By property A7, there exists —c € R such that 
c+ (—c) = 0. We use this together with other properties from Al—A22 to have that 


iy ao) (A2) 


at+0= he o))= ate)+(e) 
(A2) (A2) os ae) 
P ot+o+ oF b+ fe+Co]lPo+08 
By assumption A1 (the transitive property of equality), a = b. a 
Theorem 2.3.1 becomes useful immediately in the following. 
Theorem 2.3.2. For everya € R,a-0=0. 
Proof: Picka € R. By properties A6 and A14, we have that 
O0+a-0=a-0=a-(0+0)=a-0+a-0. (2.44) 


By Theorem 2.3.1, we may cancel a - 0 from both sides of Eq. (2.44) to havea -0 = 0. 
a 


We'll eventually get a little lazy about referencing the properties of R that we need 
to use. At first, however, we need to make sure for our own sakes that we understand 
precisely which ones we use and when we use them. 

In Section 1.3.3, we discussed unique existence. The next theorem is our first unique- 
ness proof. property A7 guarantees that every a € R has an additive inverse —a for which 
a+ —a = 0. This is an axiom. But nothing about the axioms Al—A22 precludes (at first 
glance) that a real number might have more than one additive inverse. The next theorem 
shows that, in fact, a € R has a unique additive inverse. Once we prove this, we will then 
be able to talk about the additive inverse of a. Remember how we show that something is 
unique. Suppose there are two things that both have the property in question, and show 
that they must, in actuality, be the same. 


Theorem 2.3.3. The additive inverse of a real number is unique. 


Proof: Pick ae R. Suppose b,c €R are both additive inverses of a. Then a + 
= 0 anda+c = 0. We must show that b = c. Now sincea + b = 0 and 
a-+c = 0, then by assumption Al, the transitive property of equality,a +b = 
a +c. But by Theorem 2.3.1, b = c. Thus the additive inverse of a is unique. 
a 


Now that we know —a is unique, we can make the following important observation. 
For every a € R, —a € R is the number such that a + (—a) = 0. Not only can you 
read this equation as saying that —a is the additive inverse of a, you can also read it as 
saying that a is the additive inverse of —a. In other words, this observation is a proof of 
the following theorem. 


Theorem 2.3.4. For everya € R, —(—a) =a. 


The behavior of 0 yields the next theorem. 


2.3 Algebraic and Ordering Properties of IR 


Theorem 2.3.5. —0 = 0. 


Proof: Since —0 is the additive inverse of 0, by definition it satisfies 0 + (—0) = 0. 
But for anya € R,O+a =a. Thus 0+ (—0) = —0. Thus we have 0 + (—0) = 0 
and 0 + (—0) = —0. By the transitive property of equality, —0 = 0. | 


Here’s another algebraic property of real numbers with a hint for its proof (Exercise 1). 
The first part of Theorem 2.3.6 is a statement about —(ab), the additive inverse of ab. 
To show that (—a)b = —(ab), what you want to do is show that the expression (—a)b 
exhibits the behavior that the additive inverse of ab ought to exhibit; then Theorem 2.3.3 
guarantees that (—a)b is that unique inverse. That is, —(ab) = (—a)b. The paragraph 
before Theorem 2.3.4 might give you insight into this proof. Also, recall assumption A18. 


Theorem 2.3.6. Ifa, b € R, then 
1. (—a)b = —(ab), 
2. (—a)(—b) = ab. 

Sometimes the proof of a theorem can yield a special result that is important in its 
own right, or at least might be an observation that needs to be made. A simple theorem 
that follows immediately from another more complex theorem is called a corollary. 
Corollary 2.3.7. Ifb € R, then (—1)b = —b. 


Proof: Leta = 1 in part 1 of Theorem 2.3.6. a 
Corollary 2.3.8. Ifa, b € R, then —(a +b) = (—a) + (—D). 
Proof: By Corollary 2.3.7 and the distributive property, 


—~(a +b) = (-1)(a +b) = (-Na + (—1)b = (—a) + (—D). (2.45) 
|_| 


Corollary 2.3.8 gives us the right to make a statement such as the following: “The 
additive inverse of a sum is the sum of the additive inverses.” It is quite common in 
mathematics to investigate the truth or falsity of a statement of the form “The X of the Y is 
equal to the Y of the X.” Sometimes it is blatantly false. For example, let ¥ = “square root” 
andlet Y= “sum” and you have the statement /a + b = /a+~/b. Sometimes, however, 
such a statement is true, and you're very glad to know that it is. 

Here are results similar to some of those in the preceding, but for multiplication. 
You'll prove these in Exercises 2-5. 


Theorem 2.3.9. [fac = bc andc £0, thena = b. 
Theorem 2.3.10. The multiplicative inverse ofa # 0 is unique. 
Theorem 2.3.11. For alla £0, (a“'!)"! =a. 


Theorem 2.3.12. For all nonzeroa, b € R, (ab)"! =a7'b"!. 
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Suppose we’ve chosen some nonzero a € R. First, let’s take its additive inverse —a, 
then let’s take the multiplicative inverse of that to have (—a)~!. Now start with a again, 
and do the same two processes in reverse order. First, take the multiplicative inverse of a 
to have a~!, then take the additive inverse of that to have —(a~!). You probably expect 
that these two processes done in reverse order produce the same result, but if so, it must 
be demonstrated. Look closely at the proof of Theorem 2.3.13, for this kind of stunt can 
come in handy sometimes. 


Theorem 2.3.13. For all nonzeroa € R, (—a)~! = —(a7!). 
Proof: Suppose a $+ 0. Then there exists a~! € R, and by part 2 of Theorem 2.3.6, 
l=a-a! =(-a)[-(a"!)]. (2.46) 


Since a $ 0, then neither is —a, else a = —(—a) = 0 by Theorem 2.3.5. Thus there 
exists (—a)~! € R, and we may multiply both sides of Eq. (2.46) by (—a)~! to have 
(—a)"! = —(a“'). 7 


2.3.2 Ordering of the real numbers 

— ————| 

In Chapter 0, we discussed the trichotomy law (A16). The way we address the sign of real 
numbers is by assuming that every nonzero real number can, in a sense, be compared 
to zero in one way or another by the symbol >. If a > 0, we call a positive, and if 
0 > a (ora < 0), we calla negative. Notationally, we write the positive and negative real 
numbers as Rt and R™, respectively. Assumptions Al7 and A18 describe some assumed 
behaviors of addition and multiplication in R*, namely, that it is closed under addition 
and multiplication. By splitting IR into these three pieces, that is, Rt, {0}, and R~, we 
can then assign meaning to the statement a > b by declaringa > b GS a—b> 0. 
(See Definition 0.2.1.) Thus, anytime we see a statement a > b, it means precisely that 
a — b > O. Similarly, if someone asks you to demonstrate that a > b, we can do it by 
showing a — b > 0. This definition, with Theorem 2.3.4, allows us to prove the following. 


Theorem 2.3.14. Ifa > b, then—a < —b. 


In the proof, we'll suppose a > b, to have a — b > 0. To arrive at —a < —b, which 
is the same as —b > —a, we must arrive by way of —b — (—a) > 0, for this is what the 
statement —b > —a means. Thus, the heart of the proof will be to transform a — b > 0 
into —b — (—a) > 0. Easy enough. 


Proof: Suppose a > b. Thena — b > 0. But a = —(—a), so that —(—a) — b > 0, 
which we may write as —(—a) + (—b) > 0. By commutativity, (—b) +[—(—a)] > 0, 
or —b — (—a) > 0, so that —b > —a. Hence, —a < —b. a 


Corollary 2.5.15. Supposec € R. Thenc > 0 if and only if—c < 0. 


Proof: Ifc > 0, then let a = cand b = Oin Theorem 2.3.14 to have —c < —0 = 0. 
Conversely, if —c < 0, then let b = —c and a = O in Theorem 2.3.14 to have 
—0 < —(-c), orc > 0. | 


2.3 Algebraic and Ordering Properties of IR 


Assumption A15 states that 1 4 0, which might seem silly, but actually is an essential 
assumption. You'll show in Exercise 8 that the assumption 1 = 0 causes the entire set 
of real numbers to collapse to the single element set {0}. Thus, since 1 4 0, then by the 
trichotomy law either 1 > 0 or 1 < 0. You'll show in Exercise 9 that 1 > 0 is the only 
remaining possibility, for 1 < 0 produces a contradiction. 


2.3.5 Absolute value 


Undoubtedly you are familiar with the absolute value of a real number x. 


Definition 2.3.16. Forx € R, we define |x|, the absolute value of x by 


x, ifx >0; 
|x| = 


2.47 
—x, ifx <0. ey 


Absolute value is a very important measure of the size of a real number. We can make 
two observations right off the bat: 


(N1) For every x € R, |x| > 0. 
(N2) |x| = 0 if and only if x = 0. 


Property N1 follows directly from Corollary 2.3.15. For if we pick x € R, then either 
x >Oorx < 0.Ifx > 0, then |x| = x > 0. However, if x < 0, then |x| = —x > 0. 
Property N2 follows directly from Corollary 2.3.15 and the trichotomy law. The reason 
we point these out is that properties N1I—N2 are two of the three defining properties of 
a norm, a very important term in analysis. Given a set (perhaps of numbers, functions, 
sets), a norm | -| is a measure of the size of its elements. There is a third property of a 
norm that |x| has, and we'll see it in Theorem 2.3.22. 

First, let’s explore some of the simplest and most familiar behaviors of |x|. We'll prove 
a few, either wholly or in part, and leave some to you in the exercises. Absolute value 
proofs often involve multiple cases because |x| is defined piecewise. The first one is really 
easy, so it’s all yours (Exercise 13). 


Theorem 2.3.17. Forallx € R, |—x| = |x|. 


Theorem 2.3.18. Suppose a > 0. Then |x| = a ifand only ifx =+ 


T 
= 


Proof: Suppose a > 0. 


(=>) Suppose |x| = a. Ifx > 0, then x = |x| = a. Ifx <0, then 


x = —|x| = —a. In either case, x = -ka. 

(<=) Suppose x = +a. For the case x =a, we have that x = a > 0, so that 
|x| = x =a. For the case x = —a, we have that x < 0. If x =0, thena =0 
also, and |x| = 0 = a. But ifx < 0, then |x| = —x = a. In all these cases, 
|x| =a. 
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We'll prove the => direction of Theorem 2.3.19 here, and leave the < direction to 
you in Exercise 16. It might seem weird that we’re going to suppose a > 0 only to state 
in the proof that a = 0 is impossible. The reason is that we want consistency between 
the hypothesis conditions of Theorems 2.3.18—2.3.21. 

Theorem 2.3.19. Suppose a > 0. Then |x| < a ifand only if—a <x <a. 


Proof: Leta > 0. 


(=) Suppose |x| < a. Since |x| > 0, it must be that 0 < |x| < a,soa = Ois 
impossible. If x > 0, then 


-—a<O<|x|=x <a. (2.48) 


On the other hand, if x < 0, we may write — |x| > —a to have 


a<—|x|=—-(-x) =x <0<a. (2.49) 


In either case, we have —a < x <a. | 


Let’s apply Exercise 3i from Section 1.2 by defining the following statements: 


p: |x|=a qi: x=ca r: |x| <a Si: —-a<x<a. (2.50) 


Since (p > q) A (r <5) is stronger than (p V r) < (gq V's), we can combine Theo- 
rems 2.3.18 and 2.3.19 to have the following. 


Corollary 2.3.20. Suppose a > 0. Then |x| < a ifand only if—a <x <a. 


Corollary 2.3.20 should make your proof of the next theorem quick (Exercise 17). 


Theorem 2.3.21. Suppose a > 0. Then |x| > a ifand only if either x > a orx < —a. 


Now that we have some practice at absolute value proofs, let’s look at the third 
property of a norm, and show that |x| has this important property. 


Theorem 2.3.22 (N3: Triangle Inequality). For allx, y € R, 
lx t+ yl < lal +lyl. (2.51) 


The proof of Theorem 2.3.22 is left to you in Exercise 18. What makes it more 
complicated than the previous absolute value theorems is the presence of x, y, and 
x + y. The ways |x|, |y|, and |x + y| are calculated depend on the signs of x, y, and 
x + y, respectively. Thankfully, some of the cases can be consolidated “without loss of 
generality.” There will be a hint if you need it. 

Another triangle-type inequality will prove to be important in Part II of this text. It 
can be proved in two lines from Theorem 2.3.22, if you can see how to apply it creatively 
(Exercise 19). 


Theorem 2.3.23. Forallx, y € R, |x — y| = |x| —lyl. 


2.3 Algebraic and Ordering Properties of IR 


There is one more triangle-type inequality that we'll need in Section 4.6. The proof 
requires some sneaky application of several of the results we’ve shown so far. You'll tackle 
it in Exercise 20, with hints if you need them. 


Theorem 2.3.24. Forallx, y € R,||x|—|y|| < |x — yl. 


EXERCISES 


1. Prove Theorem 2.3.6: Ifa, b € R, then: 
(a) (—a)b = —(ab). 
(b) (—a)(—b) = ab 

2. Prove Theorem 2.3.9: Ifac = bc andc 4 0, thena = b. 

3. Prove Theorem 2.3.10: The multiplicative inverse of a ¢ 0 is unique. 

4. Using reasoning similar to the argument for Theorem 2.3.4, prove Theorem 2.3.11: 
For alla 4 0, (a7!)7! =a. 

5. Prove Theorem 2.3.12: For all nonzero a, b € R, (ab)~! =a7'!b71. 

6. Prove the principle of zero products: If ab = 0, then either a = 0 or b = 0.’ 

7. Prove (a + b)(c +d) =ac + ad + bc + bd for alla,b,c,d €R. 


8. Suppose we replace assumption A15 with the assumption that 1 = 0. Show that, 
with this assumption, there are no nonzero real numbers. 


9. Prove the following. 


(a) Ifa <b, thena+c<b+c. 

(b) Ifa < bandb <c,thena <c. 

(c) Ifa > bandb>c,thena >. 

(d) Ifa < Oandb < 0,thena+b < 0. 
(e) Ifa > Oandb < 0, then ab < 0. 
(f) Ifa < Oandb <0, thenab > 0. 
(g) Ifa < bandc > 0, thenac < be. 
(h) Ifa < bandc < 0, thenac > be. 
(i) If0 <a <b, thena? < dD’. 

(j) Ifa < b <0, thena? > b’. 

(k) 1 > 0. 

(1) Fora € R, write a* = a - a. Show that for every a ER, a>0. 


2 — —] has no solution x € R. 


par 


(m) Explain why the equation x 


Part a and Theorem 2.3.4 should make this one quick. 
7See Exercise 3g from Section 1.2. 
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10. 


11. 


12. 


13. 
14. 
15. 


16. 


17. 


18. 
19. 
20. 
21. 


Prove the following: 


(a) 0—! does not exist in R. 

(b) Ifa > 0, thena™! > 0. 

(c) Ifa < 0,thena! <0. 

(d) c > lifandonlyif0 <c! < 18 

(e) Ifa > Oandc > 1, thena/c <a. 

(f) Ifa,b € Zandab = 1, thena =b=+1? 


Ifa, b € R\{O}anda < b, does it follow that 1/a < 1/b? Use results from Exercises 9 
and 10 to state and prove the relationship between 1/a and 1/b depending on the 
signs of a and b. 


Prove that if a < b are real numbers, thena < a+b/2 < b. (How do you know 
that 2 > 0? What exactly is 2 anyway?) 


Prove Theorem 2.3.17: For all x € R, |—x| = |x]. 
Prove that for all x € R, —|x| < x < |x|. 


Suppose x, y € R. Prove the following. 
(a) lxyl = IaI ly. 

(b) Ifx £0, then |x7!| = |x|7!.!° 

(c) Ify #0, then |x/y| = |x| /IyI. 


Prove the < direction of Theorem 2.3.19: Suppose a > 0. Then |x| < a if—a < 
x <a. 


Prove Theorem 2.3.21: Suppose a > 0. Then |x| > a if and only if either x > a or 
x <—a,!! 


Prove Theorem 2.3.22: If x, y € R, then |x + y| < |x| + |y|.!? 


Prove Theorem 2.3.23: For all x, y € R, then |x — y| > |x| — |y|.2-4 


Prove Theorem 2.3.24: For all x, y € R, | |x| — ly] | < |x — y]. 


There are two theorems and one exercise in this section besides Corollary 2.3.8 that 
can be worded as “The X of the Y is equal to the Y of the X.” Find them, and state 
them in this form. 


8To show 0 < c7! < 1, you must show the two inequalities 0 < c~! and c7! < 1 separately. 
9 There are five cases:a = —1l,a =0,a = 1,a > l,anda < —1. Three will produce a contradiction. 


Use a technique like the proof of Theorem 2.3.13. 

'See Exercise 3j from Section 1.2. 

?The case x > Oand y < Ois, without any loss of generality, the same as x < 0 and y > 0. However, 
the case x > O and y < 0 generates two subcases, depending on the sign of x + y. 

3One of the mathematician’s most useful tricks is knowing when and how to add zero. Start with 
Theorem 2.3.22, and reassign the roles of x and y. Don’t look at the next hint unless you have to. 
4Start with |x| by itself and add zero inside the absolute value. 

> Apply Corollary 2.3.20 by writing the expression in Exercise 19 in two ways, once switching the roles 


of x and y. 
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2.4 The Principle of Mathematical Induction 


Take another look at property A19, the well-ordering principle on page 8. In the world 
of mathematics, the well-ordering principle (WOP) is often taken as an axiom. In this 
section, we derive a theorem based on the WOP called the principle of mathematical 
induction (PMI). In high school, you might have done what is called proofs by induction, 
where you built an argument that was analagous to knocking down an infinite row of 
dominoes. First, you showed figuratively that you can knock the first domino down. 
Then you showed that if the nth domino falls, then so does the (n + 1)st. This very, very 
important proof technique is useful when the theorem you're trying to prove has a form 
like one of these: 


“ 1 
a ee re Dy ee ca ) (2.52) 
k=1 2 
or perhaps 
AU (BLN B21-++-N Br) = (AU Bi) N(AU Bg) N-+- NA (AU B,) (2.53) 


where the theorem makes a statement about a finite but unspecified n number of things, 
and you want to prove that the claim is true for anyn € N. 

You might have found Eq. (2.52) handy if you had been in grammar school with 
Carl Friedrich Gauss in the 1780s. Gauss, a very precocious child, showed amazing 
mathematical ability at a very early age. A somewhat embellished story goes that when 
Gauss was eight years old, it was raining one day during recess, and Internet access was 
down. His teacher needed to keep the children in the class busy for a while, so he told 
them to add up the first 100 natural numbers without their calculators. Gauss figured 
out how to get the result quickly in the following way. By writing the sum twice, once in 
reverse order, he added vertically, term by term: 


1+ 2+ 34 4+.---+ 99+ 100 
100+ 99+ 984+ 97+.---+ 24+ 1 
101+ 101+ 101+ 101+---+4+ 101+ 101 
ee 


100 terms 


Gauss observed that 100 x 101 is twice the desired result, so he quickly reported the 
result of 5050. If you perform a similar trick replacing 100 with an arbitrary n, you get 
Eq. (2.52). 

Although Gauss’ technique might seem sufficient as a proof, there is something a 
little disconcerting about making a claim that involves a “dot dot dot” in it. The PMI is 
a theorem derived from the WOP that eliminates this untidiness. 

So what is the PMI? And what does the WOP have to do with it? Let’s conduct 
a thought experiment to set it up, first in its standard form. Then we'll look at some 
variations. 
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2.4.1 The standard PMI 

——E 

Suppose we consider a set S, which is assumed to be a subset of the natural numbers N, 
and suppose S is known to have the following properties: 


(11) le S 
(12) Ifn > landne S,thenn+1eS. 


The question we ask is “What, precisely, is $2?” Now I1 says 1 € S, but then I2 applies 
to guarantee that, since 1 € S,then1+1 = 2 € S. But then I2 applies again to guarantee 
that 2+ 1 = 3€5S, and so forth. Now S$ C N is assumed, and it appears that every 
natural number is also in S,so that N C S. Thus it appears that S = N. Fair enough. But 
this argument has its own “dot dot dot” and is a little flimsy. 

Another way to look at this same argumentis still a little open-ended, but comes closer 
to the actual proof as we'll present it. Ifit were true that S 4 N, then since S C N, it must 
be that N Z S. Thus there exists k € N such thatk ¢ S. ByI1,k 4 1,sothatk-—1eEN. 
By the contrapositive of I2,k — 1 ¢ S either. Therefore, k — 14 1, so thatk —2 EN, 
and since k — 1 ¢ S,12 implies k — 2 ¢ S. This is where the “dot dot dot” comes in, and 
we have thatk,k — 1,k —2,..., ¢ S. But this runs head-on into the fact that 1 € S. 

If we look to the WOP, then we can clean up the untidiness in these arguments and 
find in that assumption enough strength to provide a rigorous proof that any set with 
properties I1-I2 must, in fact, be N. This is the principle of mathematical induction. 


Theorem 2.4.1 (PMI). Suppose S C N has the properties that 
(1) les 

(12) Ifn > landneéS,thenn+1eS. 

Then S =N. 


The proof of the PMI is a great example of proof by contradiction. If yowd like to try 
to prove it yourself, then take a glance at the hints!®:!7-!8.!9:?° if you need to. Here is the 
proof. 


Proof: Suppose S C N satisfies properties I1-I2. Suppose also that S # N. Then 
either S Z Nor N ¢ S. But S C N is assumed, soN ¢ S. Thus there exists 
n € N such thatn ¢ S. If we define T = N\S = NN S’, then T C Nandan € T. 
Thus T is anonempty subset of the natural numbers, which, by the WOP, contains a 
smallest element a. Nowit is impossible thata = 1 because 1 € S.Thusa > 1,sothat 
a—1€ N. Furthermore, since a is the smallest element of T, it mustbethata—1 ¢ T. 
Thus a — 1 € S. But by 12, sincea — 1 € S, it follows that a = (a—1)+1 € S. But 
ifa € S,thena ¢ T. This is a contradiction. Thus there is no smallest element of T, 
which means that T = %. Therefore, N C S, from which S = N. | 


'6The theorem says [(S C N) A 11 A 12] > (S = N). Suppose this is false. 
State N Z S in J form. 

'8T et T = N\S. What does the WOP say about T? 

!°Tf T has a smallest element a, what can you say about a — 1? 

20But ifa — 1 € S, then what is true of a? 
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What does Theorem 2.4.1 have to do with proving the kinds of formulas in Eqs. (2.52) 
and (2.53)? Think of Eqs. (2.52) and (2.53) as statements about a natural number n. They 
say, in effect, that a certain formula or statement is true for all n € N. To be as general 
as possible, address such a statement by P(m). Let’s define S to be the set of all natural 
numbers n for which the statement P (71) is true. The trick is to show that S has properties 
I1-[2. Then the PMI will allow us to conclude that S = N, which is the same as saying 
P(n) is true for alln € N. First, we show that | € S by showing P(1) is true. Then we 
show thatn ¢ S > n+1 © S bysupposing P(n) is true, and using this to show that 
P(n + 1) is true. The assumption that n € S is called the inductive assumption, and the 
part of the proof where we shown € S > n+ 1 € Sis called the inductive step. Having 
shown that S has properties 11-12, we can then conclude S$ = N; that is, the P(n) holds 
true for all n € N. Here’s a sample theorem. 


Theorem 2.4.2 (Sample). For alln € N, P(n). 
Proof: We use induction onn > 1. 
(11) P(1). 
(12) Suppose n > 1 and P(n). Then.... Thus, P(n + 1). 
Therefore, by induction, P(n) is true for alln € N. | 
There will be some point in step 12 where you use the inductive assumption P(n) 


to get you over the hump of showing P(n + 1). Study the two proofs by induction that 
follow, and notice where the inductive assumption is used. 


Theorem 2.4.3. Foralln < N, 


k= —_—. (2.54) 


Proof: We use induction onn > 1. 


(11) For the casen = 1, we have yo k = land1(1+1)/2 = 1, so that Eq. (2.54) 
holds true for n = 1. 


(12) Supposen > Land }°7_, k = n(n + 1)/2. Then 


ntl n 
k= Detatpa= Fo sary 
k=1 k=1 


nt+nt+Int+2  n?4+3n+2 _ (n+ 1)(n + 2) 
2 7 2 7 2 


Thus Eq. (2.54) holds for n + 1. 


By the PML, it follows that Eq. (2.54) holds for alln € N. a 
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Before we prove Eq. (2.53), we need to make an observation first. In Exercise 6 of 
Section 2.1, you showed that 


AU(BNC)=(AUB)N(AUC). (2.55) 
This is the special case of Eq. (2.53) where n = 2. We need this fact in proving the inductive 
step in the following. 
Theorem 2.4.4. Suppose A, By, Bo, ..., By are sets. Then 
AU (BLN B2N-++0 B,) = (AU By) AN (AU B:)N--+N (AU B,). (2.56) 
Proof: To clean up the notation, we write Eq. (2.56) as 
AU[ ML, Be] = Na (A U By). (2.57) 
(11) Ifnm = 1, then A}, Be = By, so that both sides of Eq. (2.57) are simply A U B). 
(12) Suppose n > 1 and Eq. (2.57) holds for n. Then 


AU[ M1 Be] = AU [(O7%L, Be) O Basi] 
YZ") TA U (MR Br) (AU Bost) 
= [M1 (AU By) ] N (AU Bayi) = ME} (A U By). 
Thus, Eq. (2.57) holds for n + 1. 


By the PMI, Eq. (2.57) holds for alln € N. a 


2.4.2 Variation of the PMI 


There is a natural and useful variation of the standard PMI that results from making one 
slight change. In defining S on page 62, Section 2.4.1, there was nothing magical about 
making 1 the first element of S. We could have supposed that j is any whole number and 
that S C W has the following properties. 


Jl) jes 
(j2) Ifn > jandneé S,thenn+1éeS. 
By mimicking almost word for word the proof of Theorem 2.4.1, we would have a 


similar PMI rooted, if you will, at j <« W. 


Theorem 2.4.5 (PMI, Version 2). Suppose S © W has properties JI-J2. Then S = {j, j + 
1jt+2,..jJ={neW:n= j}. 


The reason we might want Theorem 2.4.5 is that a theorem involving n €¢ W might 
not be true for all n, but only eventually true, that is, true forn > j for some j € W. For 
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example, the formula 
neo (2.58) 


is true for n > 5. You will prove this in Exercise 12. 

Let’s present an example of this variation of proof by induction just to make the 
general technique of induction more familiar. For x € R\{0} andn © W, make the 
following definition: 


(2.59) 

x =x"-x forn>0. 
This is the standard way of defining exponentiation. Instead of saying something like 
x? = x---x nine times, we say x° = x®- x. This might sound silly, but it’s an example 
of what is called a recursive definition, where the initial case is defined concretely, and 
then later cases are defined in terms of earlier ones. This definition of exponentiation has 


some familiar behaviors. If a, b € R\{0} and m,n € W, then 


a". q@= qrutn (2.60) 
(a”)" =a™ (2.61) 
(ab)” = a"b". (2.62) 


Notice we’re not allowing the exponents to be negative at this point. You'll address that 
in Exercise 7. Let’s prove Eq. (2.60) here, and leave the others to you in Exercise 6. 
Theorem 2.4.6. Supposea € R\{0} andm,n € W. Thena™ -a" =a™*", 

Proof: We prove by induction on n > 0 (thinking of m as fixed). 


(J1) For the case n = 0, we havea” -a° =a" -1 =a" =a"*®, Thus the result is 
true forn = 0. 


(J2) Suppose n > O and a” - a” = a". We show that a” -a"*! = a™ F(T), 


a” -a"t! =a". (a"-a) = (a"-a")-a 


m+(n+1) 


= qutn a= qtr =a 


Thus by induction, a” - a" = a'*" for alla € R\{0} andm,n € W. | 


Another example of a recursive definition is the factorial. 


0! = 1, 
(2.63) 
(nt+1)!=(n4+1)-n! forn>0. 


In Exercise 13, you'll prove some formulas involving sums of factorials. 


2.4.3 Strong induction 


There is another way to build the set S from page 62, Section 2.4.1. Consider the following. 
Suppose S C N has the following properties. 
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(K1) le S 
(K2) Ifn > 2and1,2,...,n—1¢S,thenne S. 


Then by yet another mimicking of the proof of Theorem 2.4.1, we could show that 
S =N. This result is called the strong principle of mathematical induction, or SPMI. 


Theorem 2.4.7 (SPMI). Suppose S < N has the properties KI-K2 above. Then S = N. 


Before we consider an example of the usefulness of the SPMI, let’s look at how it 
appears to be different from the standard PMI. If we’re required to prove a theorem 
involving n € N and induction seems to be the way to go, we might find that regular 
induction does not provide us with a strong enough assumption to make the inductive 
leap. With either form of induction, we would still need to show that 1 € S. But to make 
the inductive step, regular induction would only allow us to assume n € S and require 
us to conclude n + 1 € S solely from this. On the other hand, strong induction allows 
us to assume that all of 1, 2,..., — 1 € S, and requires us to conclude n € S from this 
more extensive set of assumptions. 

A natural number p > 2 is said to be prime if it has exactly two factors in N. Thus its 
only natural number factors are 1 and p. A number that is not prime is called composite, 
andsuchanumbern can be written in the formn = ab, wherea, b € Nand1 <a,b <n. 
The following theorem addresses the factorization of natural numbers into a product of 
primes. We use a slight variation of the SPMI by rooting it at n = 2. 


Theorem 2.4.8 (Fundamental Theorem of Arithmetic). Every natural number n> 2 
can be written as the product of primes, and this factorization is unique, except perhaps for 
the order in which the factors are written. 


Proof: We'll prove only the existence part now, and save uniqueness for Section 2.7. 
Since 2 is prime, the result is true form = 2. Thus let n > 3 be given, and suppose all 
of 2,3,...,m — 1 can be written as the product of primes. Now if n is itself prime, 
then its prime factorization is trivial, and the result is true. On the other hand, if n 
is composite, then there exist a, b € N such that 1 < a,b < nandn = ab. By the 
inductive assumption, since 2 < a,b < n — 1, botha and b can be written as the 
product of primes. That is, 


a= P\p2°** Ds 
b=192°°° 4 
where all px and qx, are prime. Thus = pj) --- psqi-+- gq, and we have found a 
prime factorization for n. a 
EXERCISES 


1. Show that the smallest element of a nonempty subset of W is unique. 


2. Prove the following sum formulas: 
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(a) Sw"_, 2 = n(n + 1)Qn + 1)/6 
(b) Spe (— D*k? = (-D"n(n + -1)/2 
ORS ay: 

(d) \yi(2k — 2) =n? 

(ce) 3 2h oy 

(f) 3 kk +1) =n/n+1 


3. Supposen € Nanda, bi, bo, ..., b, € R. Show that 
a 4 by = ya ep): (2.64) 
4. Suppose m,n € Nand aj, ao, ..., dm, b1, bo, ..., by» € R. Show that 


Ooi aj) (a1 by) = ae Ose ajbk). (2.65) 
If m =n = 2, this is the FOIL technique of multiplying two binomials.”! 
5. Suppose a1, a2,..., dn € R. Show that oe ay| <, lal: 
6. Prove Eqs. (2.61) and (2.62) fora, b € R\{0} andm,n € W. 
7. For x € R\{0} andn €N, define 
Sar oy (2.66) 
This exercise will show that the rules for exponents in Eqs. (2.60)—(2.62) hold for all 
a,b € R\{0} andm,n € Z. 
(a) Show that a~” = (a")~! for alln € W. That is, the expression a~” behaves like 
the inverse of a”, allowing us to write a~" - a” = 1.7 
(b) Show that a~"a~” = a~""" for all m,n € W. 


(c) Show that a"a~" = a"~" for all m,n € W.73 


(d) Show that (a~”)~" = a™" for allm,n € W. 
(e) Show that (a”)~" = a~"" for allm,n € W. 
(f) Show that (a~)” = a~™" for allm,n € W. 
(g) Show that (ab)"" =a~"b™ for allm,n € W. 


8. Suppose a € R\{0} andn € N. Prove the following:”* 
(a) <a)" =a” 


(b) (—a)?"+! = qt 
9. Prove the following forn € N: 
(a) If0 <a <b, thena” <b". 


2! This should be a one-liner with the help of Exercise 3. 

22 Only this part will require induction. 

3 Take a hint from a’ - a~® = a? - a® -a~® = a? = a®6 anda? - a8 = a? «a? -a~® = a~*®> = a. 
4You can prove these without induction if you call on previous Exercises appropriately. 

25Tn the inductive step, Exercise 9g from Section 2.3 takes care of the case 0 < a < b. Consider the case 
a = O separately. 
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10. 


ll. 


12. 


13. 


(b) Ifx > 1,thenx” > 1. 


(c) Ifa <b <0, then 

a gin exe 4.270 26 
ii, g2tt! < pret. 
(d) lia = 0 =D; thenae pet? 
By Exercise 4, polynomial multiplication will reveal that 

(=a bes bee) a1 oa 

If x ~ 1, we may divide both sides through by 1 — x to have 
5 


1-= 
ltxtx+xtiata = 


1-x- 
This algebraic observation suggests a general formula for the sum of powers of a real 
number x 4 1. Ifn > 0, then 


n = xt 
Lt xt? tx teeta" =) xt = ——_. (2.67) 
= 1-x 
Prove Eq. (2.67) with an induction argument rooted at n = 0. 
Use Exercise 10 to prove the following factorization formula:”* 
qi = prt! = (a < b)(a" ta" b+a" bp eases + a2b"? + ab"! +b"), 
(2.68) 


Ifn > 3, thenn? =n-n > 3n = 2n+n > 2n + 1. Use this and induction to prove 
2" > n? foralln > 5. 


Prove the following for n > 0. 


(a) Deep k/K + D)!=1-1/n+1)! 
(0) Ehpk Ml =@4t DIAL 


=—— 2.5 Equivalence Relations: The Idea of Equality 


2.5.1 Analyzing equality 


What does it mean to say that two things are equal? Consider the following: 


8=8 (2.69) 

3 9 

TF eee 2.70 

8 24 em) 
2.7999999 = 2.8. (2.71) 


6Use the fact that 0 < —b < —a and Exercise 8. 
27 Use Exercise 91 from Section 2.3. 
8Don’t make another induction argument. Letting x = a/b in Eq. (2.67) provides a good start. 
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Perhaps it seems silly even to ask what Eq. (2.69) means. After all, doesn’t everyone 
know that something is always equal to itself? But what about Eq. (2.70)? You might be 
thinking that these two fractions are equal because cross multiplying yields 72 = 72. Thus 
Eq. (2.70) is true because we can trade it for another equation comparable to Eq. (2.69). 
Then what about Eq. (2.71)? Do you remember the trick from junior high where you 
convert a repeating decimal to a ratio of integers? If you let x = 2.7999999, then you can 
say 


100x = 279.999999 
10x = 27.999999, 


which by subtraction yields 


90x = 252.000000 
= ae — a = 2.8. 
90 10 
You claim that Eq. (2.71) is true because you can reshape one side into the other by the 
rules of algebra. 

In general, we ask the following question. Given any set S$, and two elements x, y € S, 
what does it mean to say x = y? What are the fundamental properties of this thing we 
call equality that should apply in all contexts, regardless of whether the elements of S are 
numbers, functions, sets, and so forth? This section is designed to address this question. 
Once we understand these fundamental properties, we might even be able to use some 
imagination and create some entirely new types of equality. 

Before we begin this process, we need to be aware that the symbol for equality can 
vary greatly from situation to situation. Depending on what kind of things x and y 
are, there are different symbols commonly used to denote that x and y are equal. For 
example, 


Hy 
x=y 
Xx =ny 
aod 
x=y 
xeoy 


xRy 


are but a few. The statement x R y is read “x is related to y,” and derives from a way of 
addressing equality in terms of a relation, which we’ll investigate in Section 2.9. Since our 
goal is to address the features of equality that transcend context, let’s choose one of these 
common symbols, say, =, and stick with it. 

Here are three properties of equality that we would probably expect to be true in any 
context. For a set S, we would expect that 
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(El) x =x forallxe S (Reflexive property); 
(E2) Ifx = y, then y =x (Symmetric property); 
(E3) Ifx = yand y = z, thenx =z (Transitive property). 


Notice that these are the same properties of equality that we assumed for R in assump- 
tion Al from Chapter 0. These three properties are what equality is all about in any 
context. They are so important that we make the following definition: 


Definition 2.5.1. Let Sbeanonempty set, and let “x = y” beastatementforallx, ye S. 
That is, for every x, y € S, either x = y orx #yis true. Then = is said to be an equivalence 
relation on S if properties €1—€3 are true for all elements of S. 


Example 2.5.2. Let C be the set of all cities on earth. Suppose for the sake of argument 
that there are no one-way roads. Define two cities x, y € C to be equivalent, x = y, if it 
is possible to drive on roads from city x to city y. Is this definition of = an equivalence 
relation on C? 


Solution: We must verify that properties E1—-E3 hold. 
(E1) Letx € C. Since it is possible to drive from city x to city x on roads, then x = x. 


(E2) Suppose x = y. Then it is possible to drive from x to y on roads. Since we’ve 
assumed no roads are one-way, then it is also possible to drive from y to x, since 
the same roads going from x to y can be traveled in the opposite direction. Thus, 
y=EX. 


(E3) Suppose x = y and y = z. Then it’s possible to drive from x to y on roads, and 
it’s possible to drive from y to z on roads. By beginning at x, driving to y, then 
to z, it’s possible to drive from x to z on roads. Thus, x = z. 


Since = satisfies properties E1-E3, we have that = is an equivalence relation. | 


Although Example 2.5.2 seems on the surface not to be particularly mathematical, it 
illustrates the general form for proving that some definition of = is an equivalence 
relation. It illustrates the process of defining a term and applying that definition in a 
new context, a process that should be getting clearer to you. Here is a sample definition 
and theorem to drive the point home. We use the expression P(x, y) to represent some 
statement involving x and y. 


Definition 2.5.3 (Sample). Forset Y andforx, y € Y, definex = yif P(x, y). 


Theorem 2.5.4 (Sample). The equivalence = in Definition 2.5.3 is an equivalence relation 
ony. 


Proof: We show that properties E1-E3 hold for = on Y. 
(El) Pick x € Y. Then...so that P(x, x) is true. Thus x = x. 
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(E2) Suppose x = y. Then P(x, y). Thus..., so that P(y, x). Therefore, y = x. 


(E3) Supposex = yand y = z. Then P(x, y) and P(y, z). Then...,sothat P(x, z). 
Thus x = z. 


Since = satisfies properties E1-E3, we have that = defines an equivalence relation 
on Y. a 


Example 2.5.5. (A very important example). For x, y € Z, define x =6 y if there 
exists k € Z such that x — y = 6k. This is to be read, “x is congruent to y, modulo 6 (or 
mod 6).” This definition says that x =. y if x — y is divisible by 6. Show that =¢ defines 
an equivalence relation on Z. 


Solution: Details are left to you in Exercise 1. Here’s the skeleton. 


(El) Pick x € Z, and let k = ?. Then x — x = 0 = 6k. 
(E2) Suppose x =6 y. Then there exists k; € Z such that x — y = 6k;. Letky =.... 


(E3) Suppose x =6 y and y =6 z. Then there exist ki, kz € Z such that.... a 


Example 2.5.5 is particularly important in algebra and pervades Part III of this text. 
The number 6 was chosen arbitrarily, of course. We can discuss =, for anyn € N, defining 
x =, y ifand only if there exists k € Z such that x — y = nk. Notationally, there are two 
other common ways of writing this form of equivalence among integers. They are 


x=y (mod n) (2.72) 
x=y (n). (2.73) 


Example 2.5.6. Suppose we have an equilateral triangle, and we write the numbers 
{1, 2, 3}, one on each corner. Call one such writing of these numbers an assignment. 
How many assignments are there? For two assignments x and y, define x = y if it is 
possible to convert x into y by rotating and/or flipping the triangle over. Does = define 
an equivalence relation on the set of all assignments? 


Solution: First, since there are three corners and three numbers to assign, we can 
count six assignments. (Any one of three numbers can be written on the top corner, 
then any one of two remaining numbers can be written on the lower left corner, 
and the only remaining number is written on the lower right corner. This yields 
3 x 2 x 1 = 6 assignments.) We verify that = satisfies E1-E3. 


(El) By rotating the triangle 360° and flipping it six times, we see that x = x. 


(E2) Suppose x = y. Then it is possible to convert x into y by some combination of 
rotation and/or flip. By exactly reversing the process of converting x to y, we 
can convert y tox. Thus y = x. 


(E3) Suppose x = y and y = z. Then x can be converted into y, and y can be 
converted into z by rotations and/or flips. By converting x into y, then y into 
z in succession, it is possible to convert x into z. Thus x = z. 
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Since = satisfies E1-E3, = defines an equivalence relation on the set of all 
assignments. | 


Example 2.5.7. Let S be the set of all school buildings in the United States. Say that two 
schools x and y are equivalent, that is, x = y, if the height of the flagpole in front of 
school x is within one foot of the height of the flagpole in front of school y. Does this 
definition of equivalence constitute an equivalence relation? 


Solution: Although El and E2 are satisfied, E3 is not. We can show that the statement 
“x = y and y = z implies x = z” isa false statement. What is the logic behind this 
demonstration?”? We demonstrate the existence of a counterexample to transitivity 
where x = y and y = z, but x ¥ z. Let the heights of the flagpoles in front of schools x, 
y, and z be 29, 29.6, and 30.3 ft., respectively. Then, clearly, x = y and y = z, but 
x # z. Since transitivity fails, = is not an equivalence relation. | 


2.5.2 Equivalence classes 

SEES 

In Example 2.5.2, it might have occurred to you that defining two cities to be equivalent 
as we did takes all cities on earth and lumps them together into groups that are mutually 
accessible from each other. For example, all cities on the north island of New Zealand are 
equivalent to each other because there is (presumably) a network of roads connecting 
them all. Furthermore, no city outside the north island of New Zealand is equivalent 
to any city on the island. If it were, say, for example, by some newly constructed bridge 
between Auckland, NZ, and Santiago, Chile, then Santiago would be equivalent to all 
the cities on the north island of New Zealand. Furthermore, all cities accessible from 
Santiago would also become equivalent to all cities on the north island of New Zealand. 
It’s sort of like the molecules in two drops of water. Either the two drops do not touch at 
all, or, if they do, they instantly merge into one drop. 

This illustrates that there is a lot of strength in the properties E1-E3. One very 
important feat that an equivalence relation defined on S performs is that it completely 
splits S up into very nice, nonempty, nonoverlapping subsets. In general, the splitting of 
a set into nonoverlapping subsets is called partitioning. When an equivalence relation is 
defined ona set S, it naturally partitions S into subsets where elements of the same subset 
are all equivalent to each other, and elements of different subsets are not equivalent. Each 
subset of this sort is called an equivalence class. First, we'll define a partition and look at 
an example. Then we’ll see how an equivalence relation gives rise to a partition. 


Definition 2.5.8. Suppose Sis aset, and F = {A} is a family of subsets of S. Then F is 
said to be a partition of S if 


(Pl) AAPforall A € F; 
(P2) If A, Be F,andAN BAG, then A = B; and 
(P3) Uaeg A = 8S. 


2°what is the negation of (p Aq) > r? 


2.5 Equivalence Relations: The Idea of Equality 


Let’s illustrate with a specific example before we discuss general properties of parti- 
tions. Just remember, a partition of a given set S is nothing but a family of subsets of S$ 
with some special properties. 


Example 2.5.9. For Njo, consider the following four families of subsets: 
F, = {{l}, {2, 3,5, 7}, {4, 6, 8, 10}, (9}} 
F2 = {G, {1, 2, 3, 4}, (5, 6, 7, 8}, {9, 10}} 


(2.74) 
F3 = {{1}, {2, 4, 6, 8, 10}, {3, 6, 9}, (5, 10}, {7}} 


F4 = {{Primes in Nip}, {Composites in Nio}}. 


Of these four families, only F is a partition of Njo. Notice how F; satisfies properties 
P1-P3. First, no set in F; is empty. Second, they are all disjoint. If we choose any two of 
them and they have a nonempty intersection, then they are the same set. Third, forming 
the union across F; produces Njo. 

Notice how the other families fail to be a partition of Njo. Since a partition of a set 
must contain only nonempty subsets, F> fails to be a partition of Nio. In F3, there exist 
two distinct sets with nonempty intersection. Finally, the union across F4 is not Njo for 1 
is neither prime nor composite. 


In the work we'll do in what follows, there will be some program by which we 
construct subsets of a given S in order to create a partition. To verify that this program 
does indeed partition S, we'll need to show that each of the properties P1—P3 is satisfied 
on J. 


(P1) The program for forming the subsets of S must always generate nonempty subsets. 
Thus if we choose an arbitrary set in the family, we must be able to find some 
element of S in the chosen set. 


(P2) This property says that if A and B overlap at all (AN B # 9), then they really are 
the same set, so that no x € S can be in more than one distinct set A € ¥. Thus if 
we suppose there exists some z € AM B, we must be able to show A = B. 


(P3) This property says that the generated sets in the family completely exhaust all 
elements of S. Naturally, the program for forming the subsets of S should be defined 
so that it creates only subsets of S. Then Exercise 5 from Section 2.2 guarantees that 
the C part of P3 is satisfied. What remains to be shown is >, which amounts to 
showing that for every x € S there exists some A in the family F such that x € A. 


Example 2.5.10. For C in Example 2.5.2, choose a city x € C and define A, to be the 
set of all cities accessible from city x. Show that J = {A, : x € C} isa partition of C. 
Solution: We verify that properties P1—P3 are satisfied on F. 


(P1) Pick any A € F. Then there exists x € C for which A = Ay. Since city x is 
accessible from itself, we have that x € A, so that A # @. 


(P2) Pick Ay, Ay € F, and suppose A, M Ay # WJ. Then there exists z € A, M Ay. 
We show that A, = Ay. First, pick a € A,. Then city a is accessible from city 
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x, and since z € A,, z is accessible from x. (Draw a picture!) Furthermore, 
since z € Ay, it follows that z is accessible from y. Thus, a is accessible from y 
by traveling y > z — x —> a,so that a € A, and A, C Ay. By an identical 
argument in the reverse direction Ay C A,;. Thus, Ay = Ay. 


(P3) Since A, C C for all x € C, Exercise 5 from Section 2.2 implies that 
UxecAx © S. To show 2, pick any y € S. We must find some set in F 
that contains y. But y € Ay, so this is clearly true. Thus y € Uyec Ax, and 
SC Uyec Ay. | 


In showing P3 in Example 2.5.10, notice there is some redundancy in forming 
the union U,<cA,. For example, Wellington, another city on the north island of New 
Zealand, is in Awettington, but it’s also in Aauckland- Thus dumping all cities in each A, 
together to form U,cec Ax means Wellington is tossed in more than once. But that’s all 
right. All we care about is that every city is tossed in at least once. 

Now let’s see how an equivalence relation on a nonempty set S is tied to a partition 
of S. We'll work our way into a theorem, tying them together by way of some examples. 
First, a definition. 


Definition 2.5.11. Suppose = defines an equivalence relation on anonempty set S. Foran 
element x € S, define 


[IxJ={yeS:y=x}. 


That is, [x] is the set of all elements of S that are equivalent to x. This subset of S is called 
the equivalence class of x. 


Example 2.5.12. From Example 2.5.2, the equivalence class of Auckland is the set of all 
cities accessible from Auckland, that is, all cities on the north island of New Zealand. It 
is the same as the equivalence class of Wellington. 


Example 2.5.13. For equivalence mod 6 from Example 2.5.5, given any x € Z, 
[Ix]={ye€Z:y—x=6k, forsomek € Z}. 


Thus, [x] is the set of all integers y of the form y = 6k + x, where k can be any integer. 
(See Exercise 7.) 


Example 2.5.14. From Example 2.5.6, there are six assignments, and a given assignment 
can be altered into any of the remaining five by rotations and/or flips. Thus there is only 
one equivalence class in the set of assignments. 


In all the preceding examples of equivalence relations, the equivalence classes seem 
to be a basis of a partition of the set. This is no coincidence, of course. Theorem 2.5.15 
states that properties E1-E3 can be used to demonstrate that the family of equivalence 
classes satisfies P1-P3. Most details are left to you in Exercise 8. 


Theorem 2.5.15. Suppose = defines an equivalence relation on a set S. Define ¥ = 
{[x] : x € S}. That is, F is the family of all equivalence classes in S. Then F is a partition 
of S. 


2.5 Equivalence Relations: The Idea of Equality 


Proof: We verify that properties P1—P3 from Definition 2.5.8 are satisfied. 


(P1) No element of F is empty. For if we pick some [x] € F, then since x = x, it 
follows that x € [x]. Thus [x] 4 @. 


(P2) Pick [x], [y] € F and suppose [x] N [vy] 4 @.... 


(P3) Since every set in F is defined to be a subset of S, we have Uyes[x] C S by 
Exercise 5 from Section 2.2. Thus we must show D. Pick y € S.... | 


Theorem 2.5.15 is very important for the following reason. If some form of equiva- 
lence = is defined on a set S, and it can be verified that this definition is an equivalence 
relation, then Theorem 2.5.15 guarantees that S is automatically partitioned into equiv- 
alence classes by this definition of =. We can then think of any element of a particular 
equivalence class as being representative of all elements in its class. Furthermore, we can 
think of = in the same way we think of equality, where two equivalent elements are, in 
some sense, interchangeable. The partitioning of S into equivalence classes lumps the 
elements of S into categories where one element can be replaced with any other element 
in its category, within limits, of course. In Section 2.6 we'll illustrate this phenomenon 
on the rational numbers, where we’ll investigate the familiar definition of equality of 
two fractions. Then we'll show how the binary operations of addition and multiplica- 
tion on Q are defined so that different representative elements of equivalence classes are 
indeed interchangeable. 


EXERCISES 


1. Prove that =, in Example 2.5.5 is an equivalence relation. 
2. Show that Definition 2.1.2 defines an equivalence relation on the set of all sets. 


3. Let F be the family of all nonempty sets, and define A = BifAN BF Y.Is=an 
equivalence relation on ¥? 


4. Let S be the set of four-letter words in the Random House Unabridged Dictionary. For 
two words x, y € S define x = y if it’s possible to construct a sequence of words in 
S, beginning with x and changing one letter at a time to create another word in S at 
each step, ending up at y. (A sequence of one word is considered valid.) For example 
BABY = POOP because of the sequence BABY, BABE, BARE, BARN, BURN, BURP, 
BUMP, PUMP, POMP, POOP. Show & is an equivalence relation on S. 


5. Is < an equivalence relation on R? Prove or disprove. 


6. Is ~ an equivalence relation on R? Prove or disprove. 


7. How many equivalence classes are there in Example 2.5.13? Describe them by listing 
some of their elements. 


8. Complete the proof of Theorem 2.5.15 by showing that the set of equivalences of an 
equivalence relation = satisfies properties P2 and P3.°° 


3°Take some hints from Example 2.5.10. 
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9. Consider a square in the xy plane, turned diagonally so that its corners are on the 
axes. Define an assignment of {1, 2, 3, 4} as in Example 2.5.6. How many assignments 
are there? For two assignments x and y, define x = y if x can be converted into y by 
some combination of rotation about its center point and/or flipping the square over 
from top to bottom. How many equivalence classes are there? List their elements. 


—— 2.6 Equality, Addition, and Multiplication in Q 


Asan illustration of the concept of equivalence relation, we want to take a look at equality, 
addition, and multiplication in the integers and see how they give rise to equality, addition, 
and multiplication in the rationals. This section is effectively a dissection of some of the 
real number properties assumed in Chapter 0, so a few prefacing words are in order about 
what we are trying to accomplish here. 

First, let’s consider the whole numbers. Beginning with the set {0, 1}, which by 
assumption A15 is indeed a two-element set, we calculate all possible sums ofits elements. 
We know from the behavior of 0 that0 + 0 = O0and1+0=0+41 = 1. What is not 
immediately clear is 1 + 1. Is it possible that 1+ 1 = Oor 1+ 1 = 1? In Exercise 1, you'll 
show that certain axioms and properties of R prevent either of these from being possible. 
Since 1+1 ¢ {0, 1}, it might be helpful and simpler to devise anew symbol, such as maybe 
2, to denote 1 + 1. This is just the beginning of a process of building the whole numbers 
as a set of successors of previously defined whole numbers. The assumptions used to build 
them are called Peano’s postulates. Because they are formulated without reference to the 
set of real numbers as a context, Peano’s postulates and their implications generate the set 
of whole numbers in a somewhat more complicated way than we describe here. One of 
the assumptions is that 0 is not a successor to any whole number, so that the sequence of 
successors, which we call 1, 2,3, ..., never circles back around to 0. 

Once the set of whole numbers has been constructed, we extend it to the integers by 
tossing in the additive inverses of all the whole numbers. These additive inverses are in fact 
new elements not already in W, except for —O0 = 0. Forifany whole numbera > 1hadb € 
W as its additive inverse, then a + b = 0 would imply that 0 is the successor toa+b—1. 

In this section we want to give ourselves the standard assumptions concerning equality 
and well-definedness of addition and multiplication on the integers, just as we did for 
R in properties Al, A2, and A8 from Chapter 0. For now, however, we assume these 
properties only for the integers. Specifically, using the symbol =z to represent equality 
on the set of integers, let’s assume the following: 


(Z1) For everyn € Z,n =z n. 
(Z2) Ifm,n € Zandm =z n, thenn =z m. 


(Z3) Ifm,n, p € Z,m =znandn =z p, thenm =z p. 


Let’s also restate assumptions A2 and A8, that addition and multiplication are well 
defined on R, in terms that apply only to the integers. To be clear that addition and 
multiplication are being performed only on integers, let’s use +z and xz to represent 
these operations. 


2.6 Equality, Addition, and Multiplication in Q 


(Z4) Addition and multiplication are well defined on the integers. That is, ifm,n, p,q € 
Z,m =z n,and p =z q,thenm+z p=zn+zqandm xz p=zNn Xzq. 


In what follows, we want to use assumptions Z1—Z4 on Z to derive similar results 
on Q. We'll also use the other assumptions from Chapter 0 concerning the behavior 
of addition and multiplication, but only for the integers. To be clear, we'll use the new 
notation from assumptions Z1—Z4 to show that the properties are being applied only 
to the integers. With these things in mind, we’re ready to dissect equality, addition, and 
multiplication in Q and see how they derive from the same concepts in Z. 


2.6.1 Equality in Q 
tl 
In Section 0.2 we defined the set of rational numbers as 


Q={p/q:p,¢d€Z,q #9}. 


There is nothing inherent in the definition of Q that suggests how you would address 
equality of two elements p/q andr/s. In fact, there is more than one useful way to define 
equality on Q, but we will look only at the familiar one. 

What is the familiar meaning of the statement p/q = r/s??! It certainly does not 
mean that p =r andq = s. We'll think of the equality of two rational numbers as being 
defined in terms of equality of the integers ps and qr. That is, if we use the symbol =g to 
mean equality of two rational numbers, then the standard definition of rational equality 
is 

P =9 " Oo PXZS=29 Xzr. (2.75) 
qd S 
Notice what we’ve done here. We've defined =g in terms that use only =z and xz. Notice 
also the positions of p, g, 7, s in Eqs. (2.75). These positions must be retained when we 
translate between p/q =g r/s and p Xz 5 =z q Xzr. 

Now we need to verify that =g has the properties we would want any definition of 

equality to have (E1-E3) by assuming Z1-Z4. 


Theorem 2.6.1. The relation =g is an equivalence relation on Q. 


Proof: Most of the details are left to you in Exercise 2. 


(El) =g is reflexive. Pick p/q € Q. Then since multiplication of integers is com- 
mutative, we have that p xz q =z q Xz p. Thus by the definition of =g in 
Eq. (2.75), p/q = P/d- 

(E2) =g is symmetric. Suppose p/q =g r/s. Then.... Thusr/s =g p/q. 

(E3) =g is transitive. Suppose p/qg = r/s and r/s =g t/u. Then .... Thus 
P/q =Q t/u. 2 


Thanks to Theorem 2.6.1, it follows from Theorem 2.5.15 that Q is partitioned into 
equivalence classes. Presumably, you have an idea of what these are. First, all expressions 


3! Think in terms of cross multiplication. 
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of the form 0/qg where g # O are equivalent to each other. Conversely, if p;/q1 =Q 0/q2; 
then p; = 0. For the case of nonzero numerators, if p/q = r/s, then we may find a 
common denominator qs, and write p/qg = ps/qs andr/s = qr/qs. By the way we 
defined equality of fractions, the numerators ps and qr are equal. Therefore, if p/q is 
a fraction in fully reduced form, the equivalence class of p/q can be thought of as all 
fractions that can be reduced by standard cancellation to p/g. We're then always free to 
address rational numbers in a form where the fraction has been fully reduced. 


2.6.2 Well-defined + and X onQ 
EEE 
Now that we’ve shown standard equality of rational numbers is an equivalence relation, 
we can define a form of addition of rational numbers, which we’ll denote +, and show 
that it is well defined. What definition of +9 should we use?** We'll use the definition, 
r S+zqr 
i ares a 
q s qs 
Observe that Eq. (2.76) will always produce a “legitimate” result: that is, the fact that 
p/q,r/s € Q ensures that g,s 4 0. Consequently, gs 4 0 by the principle of zero 
products. Thus Eq. (2.76) fits the required form for elements of Q because it has an 
integer numerator and nonzero integer denominator, and we can say that +q is closed 
on Q. Here’s the theorem you'll prove in Exercise 3. 


(2.76) 


Theorem 2.6.2. Addition on Q as defined in Eq. (2.76) is well defined. That is, if p/q,r/s, 
t/u, v/w € Q, and if p/q =q r/s andt/u =g v/w, then p/q+ot/u=q1r/s+ov/w. 

In the same way you showed that =g is an equivalence relation by assuming that 
=z, is an equivalence relation, you'll show + is well defined by assuming +z and xz 
are well defined. Notice from Eq. (2.76) that anything of the form 0/s functions as the 
additive identity in Q, for 


Pp O_ pst+q-0_ ps Dp 


4 a (2.77) 
q § qs qs q 
Also, notice (— p)/q and p/(—q), which are equivalent, function as —(p/q). 
Multiplication of two rational numbers is defined in the following way: 
t t 
a an (2.78) 
q uo qu 


Proving multiplication is well defined is easier than for addition (Exercise 4). 


Theorem 2.6.3. Multiplication on Q as defined in Eq. (2.78) is well defined. That is, if 


p/q,1/s,t/u,v/w € Q, and if p/q =q r/s andt/u =g v/w, then p/q xqt/u =o 
r/s Xgu/w. 


Notice anything of the form p/p functions as the multiplicative identity, and 
(p/q)~' = q/p,as long as p # 0. 


32What can serve as a common denominator? What will the numerator therefore be? 


2.7 The Division Algorithm and Divisibility 


EXERCISES 


1. Show that 1+ 1 = 0 and 1+ 1 = 1 are impossible, given the assumptions and 
properties of R we have already addressed.” 


2. Complete the proof of Theorem 2.6.1, that =g from page 77 is an equivalence 
relation, by showing it has properties E2 and E3. 


3. Prove Theorem 2.6.2: If p/q,r/s,t/u,v/w € Q, and if p/q =g r/s and t/u =9 
v/w, then p/¢ +o t/u =g r/s +o v/w. 


4. Prove Theorem 2.6.3: If p/g,r/s,t/u,v/w € Q, and if p/q =9 r/s and t/u = 
v/w, then p/¢q xgt/u = 1r/s Xqvu/w. 


2.7 The Division Algorithm and Divisibility 


A theory of real numbers is a fascinating progression of ideas beginning with the set 
{0, 1} and developing through the famous subsets of R: W Cc Zc Q C R. In Section 2.3 
we investigated properties of IR that follow from assumptions Al—A22, but we did not 
look into any unique properties of certain subsets of R. This text is hardly designed to 
be a thorough treatment of the real numbers, for the real numbers are an intricate set 
with properties characteristic of some of the most complex mathematical structures. 
However, some characteristics of the integers, rationals, and irrationals are basic and 
representative of the more abstract mathematical structures that you will become very 
familiar with in time. In this section, we investigate a few of the basic properties of the 
integers. 


2.7.1 Even and odd integers; the division algorithm 


One way to assign meaning to the words even integer and odd integer is the following: 


Definition 2.7.1. If © Z, we say that 77 is even if there exists k € Zsuch thatn = 2k. 
We say that 77 is odd if there exists k € Zsuchthatn = 2k + 1. 


Theorem 2.7.2. Supposem,n € Z are both even. Then mn is even. 


Proof: Suppose m,n € Z are both even. Then there exist kj, ko € Z such that 
m = 2k; andn = 2k. Let | = 2k, kp, which is an integer. Then 


mn = (2k;)(2ky) = 4kiko = 2(2k, ky) = 21. (2.79) 


Thus mn is even. a 


33 Consider the trichotomy law and cancellation of addition. 
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Corollary 2.7.3. Ifn € Z is even, thenn? is even. 


Proof: Let m =n in Theorem 2.7.2. a 


The proof of the following similar theorem is left to you in Exercise 1. Corollary 2.7.5 
will prove useful in Section 2.8. 


Theorem 2.7.4. Supposem,n € Z are both odd. Then mn is odd. 
Corollary 2.7.5. Ifn € Z is odd, thenn? is odd. 


At this point, you may be thinking that Definition 2.7.1 addresses the only two 
possible situations that can happen in the integers. After all, every integer is either even 
or odd, isn’t it? Well, maybe, but how do you know that? How do you know that every 
integer can be written either in the form 2k or 2k + 1, but not both? If you’ve sensed the 
need to justify the fact that every integer is either even or odd, but not both, then youre 
catching onto the game of mathematics. 

Ever since elementary school, you've been familiar with the idea of dividing an integer 
b by another integer a > 0 to produce a quotient q and remainder r. One way you could 
express the results of your division calculation in an equation is to write 


b=aq+r, (2.80) 


where, you hope, 0 < r < a. For example, if a = 12, b = 88, we can write 88 = 12 x 7 
+4, andifa = 6,b = —13, we have —13 = 6x (—3) +5. One nice thing about the form 
of Eq. (2.80) is that every number involved is an integer. What resembles the division 
of integer by integer is written without having to resort to the rational numbers. The 
following theorem says that the existence of a quotient g and a remainder 0 < r < aare 
guaranteed. Even more, they are unique. The theorem is a surprisingly useful one. 


Theorem 2.7.6 (Division Algorithm). Let a,b € Z, where a > 0. Then there exist 
unique q,r € Z such thath =aq+rand0 <r <a. 


We're going to provide part of the proof here, the existence of g and r for the case 
b > 0. In Exercise 5, you'll show existence for b < 0 by exploiting the existence of g 
and r for —b > 0 and then you'll show uniqueness. The technique we'll use in showing 
the existence for b > 0 will call on the WOP of W, and is similar to a technique you'll 
use later when you prove Theorem 2.7.13. For Theorem 2.7.6, we use the set 


S={b-—aq:q€Z, b—agq => 0}, (2.81) 


which is merely the set of all whole numbers you can generate by subtracting integer 
multiples of a from b. For example, ifa = 12 and b = 104, then 


S = {..., 128, 116, 104, 92, 80, 68, 56, 44, 32, 20, 8}. (2.82) 


The last element of S is the result ofletting g = 8. Notice that it is the smallest element of S, 
and is strictly less than 12. By constructing the corresponding S according to Eq. (2.81) 
for an arbitrary a > 0 and b > O, we can apply the WOP to S, then show that this 
smallest element is the value of r we want. Here’s the proof of Theorem 2.7.6, and in as 
much detail as we promised. 


2.7 The Division Algorithm and Divisibility 


Proof: Leta, b € Z,wherea > 0. First, we consider the case b > 0. Define the set S 
as in Eq. (2.81). By definition, S C W, and since b > 0, we may let g = 0 to see that 
b € S,so that S 4 Y. By the WOP, S has a smallest element, which we may denote r, 
and we also note that r is of the form b — aq for some q € Z. Thus, we have that 
b =aq +r, wherer > 0, and we must show that r < a. Suppose r > a. Then 


r=b-—aq>b-—aq-a=b-a(q+1)=b-aq-a=r-—a=0, (2.83) 


and therefore b — a(q + 1) is an element of S that is strictly smaller than r. This 
contradicts our choice of r and so it must be that r < a. 

According to Exercise 5, the requisite values of g and r exist for b < 0, and qg 
and r are unique. 7 


From Section 2.5 we know that 81 =7 25 because 81 — 25 = 56 = 8 x 7. Applying 
the division algorithm to a = 7 and b; = 81, then toa = 7 and by = 25, we have 
81=7x 11+4and 25 =7 x 3+4. Therefore, 81 and 25 have the same remainder 
when divided by 7 according to the division algorithm. This illustrates the following 
theorem, which says merely that x and y differ by a multiple of n if and only if they have 
the same remainder when divided by n (Exercise 6). 


Theorem 2.7.7. Supposex, y € Zandn € N. Thenx =, y if and only ifx and y have 
the same remainder when divided by n according to the division algorithm. 


2.7.2 Divisibility in Z 


Motivated by Theorem 2.7.6, ifa, b € Z \ {0}, and there exists k € Z such that b = ak, 
we say that a divides b or that a is a divisor of b, and we write a | b. If a| bwherea ¢ {+1, 
+b}, we call a a proper divisor of b. First, here are some really easy theorems about 
divisibility that yow'll prove in Exercises 8-11. 


Theorem 2.7.8. Ifa € Z\{0}, thena | a. 
Theorem 2.7.9. Ifa, b,c € Z\{0} such thata|b andb|c, thena|c. 
Theorem 2.7.10. Ifa|b anda|c, then for allm,n € Z,a|(mb+nc). 


An expression of the form mb + nc is called a linear combination of b and c. Theo- 
rem 2.7.10 says that if a divides both b and c, then it divides any linear combination of 
them. 


Theorem 2.7.11. Ifa |b andb | a, thena = +b. 


Some important and useful results in algebra stem from what we call the greatest 
common divisor (gcd) of two nonzero integers a and b. One practical way you might 
try to find the gcd of two integers is to break them down into their prime factorizations 
(Theorem 2.4.8 even though we didn’t prove uniqueness there), then see how many 2’s, 
3’s, 5’s, etc. you can extract from both in order to construct the gcd. This might work 
well practically, but: 1) its logical basis in the work we’ve done so far is uncertain at 
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best; and 2) it is not particularly useful as leverage in our later theorems. Instead, we 
define gcd in terms of two criteria, with one motivated by the word common and the 
other motivated by the word greatest. Then we show that such a thing exists uniquely 
and can be written in a somewhat surprising way. So that we will have uniqueness of 
gcd(a, b), we’re going to insist that any integer we might be inclined to call a gcd must 
be positive. 


Definition 2.7.12. Suppose a, b € Z\{0}, and suppose g € Z* has the following prop- 
erties: 


(D1) g|aandg |b. 


(D2) Ifh is any positive integer with the properties that h|a and h|D, then it must be that h|g 
also. 


Then g is called agreatest common divisor of a and b, and is denoted gcd(a, b). 


Some remarks about Definition 2.7.12 are in order. First, nothing about the definition 
of gcd(a, b) (or of any term for that matter) guarantees that any such thing exists. 
It merely lays out criteria by which we declare some positive integer to be gcd(a, b). 
Second, property D1 clearly states that anything you want to call gcd(a, b) will in fact 
be a common divisor of a and b. However, we need to explain and perhaps justify our 
choice of property D2 as the other criterion, for it might seem a bit unnatural. If property 
D2 is supposed to be some way of describing what it means for g to be greatest of all 
the common divisors of a and b, you might think a more natural way to say it would 
be: 


(D2) Ifh is any positive integer with the properties that h | a and h | b, then it must be 
thath < g. 


Well, we could do it that way, but we don’t and this is why: It all centers around 
how you decide to measure greatness. In Z*, we can measure relative greatness by <. 
But as you'll see in your later work in algebra, there are other more abstract settings 
where we discuss divisibility, then ask about a gcd, but we do not necessarily have a 
notion of < to use as our criterion for greatness. For example, if you want to talk 
about one polynomial dividing another polynomial and then ask about their gcd, it 
would not seem to mean much to say that some common divisor of theirs is greatest 
of all, in the sense of <. After all, what could f < g mean for two functions? Not 
that we cannot give meaning to < in that context, but the point is that it’s better to 
develop a criterion for greatness of a divisor that stays within the context of divisibil- 
ity rather than relying on some external measure like <, which might or might not 
already exist. All this is to say that property D2 is our measure of greatness among all 
common divisors of a and b. If g is what we’re going to call gcd(a, b), it has the prop- 
erty that any positive integer h that comes down the pike that also has property D1 
cannot, in some sense, be greater than g. Our way of saying h is no greater than g is 
that h | g. 

Therefore, given a, b € Z \ {0}, how do we even know that there exists some g € Zt 
having properties D1-D2? And if sucha g € Z* does exist, how many can there be? The 
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following theorem claims that gcd(a, b) exists uniquely. Furthermore, hidden inside its 
proof is some additional information about gcd(a, b) that will come in handy later. 
Finally, here it is. Only a few of the details are required of you in Exercise 15. 


Theorem 2.7.13. Suppose a,b € Z\{O}. Then there exists a unique g € Z* having 
properties DI-D2. 


Proof: Picka, b € Z\{0} and define 
S={ma+nb:m,ne€Z,ma+nb > 0}. (2.84) 


That is, S is the set of all positive linear combinations of a and b. First, S is not empty, 
for depending on the signs of a and b, we may let m,n = +1 to produce some 
ma-+nb > 0. By the WOP, S contains a smallest element g, which may be written 
in the form g = moa + nob for some mo, No € Z. With Exercise 15, g has properties 
D1-D2, and if g1, g2 € Z both have properties DI1—D2, then g; = go. | 


Look at the serendipity in the proof of Theorem 2.7.13. The gcd of a and b can be 
written in the form ma-+nb for somem, n € Z, and the smallest such positive expression 
is in fact the gcd. This can be particularly helpful if ged(a, b) = 1. If gcd(a, b) = 1, then 
aand bare said to be relatively prime. Immediately we can see that if a and b are relatively 
prime, then there exist m,n € Z such that ma + nb = 1. Furthermore, if it is possible 
to find a linear combination of a and b that equals 1, then this linear combination must 
be the smallest element of S in Eq. (2.84), hence ged(a, b) = 1. 

In Section 2.4, we defined p € N to be prime if it has exactly two divisors in N, 
namely 1 and p. Thus, ifa € Z\{0} and p € N is prime, then gcd(a, p) is either 1 or 
p, because these are the only divisors of p. If gcd(a, p) = p, then p satisfies D1, so that 
p|a. With this, we have proved the following: 


Theorem 2.7.14. Ifa € Z and p € N is prime, then either a and p are relatively prime 
or p|a. 


Theorem 2.7.14 will help you prove the following in Exercise 17: 


Theorem 2.7.15. Ifa, b € Z, p € N is prime, and p | ab, then either p|a or p|b. 


As an immediate corollary, by letting a = b in Theorem 2.7.15, we have the following: 


Corollary 2.7.16. If p|a’, then p|a. 


Then with an induction argument in Exercise 18, you can prove the following: 


Theorem 2.7.17. Suppose aj, a2,...,d,€Z and péN is prime. Suppose also that 
Dp | a\a2-++ ay. Then there exists somek (1 < k <n) such that p | ag. 


The last thing we will do in this section is return to the prime factorization ofn ¢ N 
whose existence we proved in Theorem 2.4.8 to extend it to include something like 
uniqueness. 
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Theorem 2.7.18. The prime factorization of n > 2 is unique, except perhaps for the order 
in which the factors are written. 


Proof: Choose n € N, and suppose n = pi p2--- pe andn = qigo---q are two 
ways of writing n as a product of primes. We show by induction on k that / = k and, 
with some possible reordering of the q;, that p; = q; for alll <i <k. 

Ifk = 1, thenn = pj, so that n is prime. Thus / = 1 and p; = q. Therefore 
suppose k > 2, and suppose that any factorization of a natural number into k — 1 
primes is unique up to order of the factors. Since 


P1P2-++ Pk = 9192+ * Qs (2.85) 


then px | qig2---qi. By Theorem 2.7.17, there exists some j, where 1 < j < J, 
such that p,; | qj. Since g; is prime, py = qj. Reordering the right-hand side of 
Eq. (2.85) by switching g, and q;, then canceling p, and q from Eq. (2.85), we have 
factorizations of the natural number n/p, into k — 1 and/ — 1 primes. Since px > 2, 
we have n/p, < n. Applying the inductive assumption to n/p,;, we conclude that 


k — 1 =1-— 1, and with some possible reordering of qi, ..., qi-1, we have pi = qi 

for 1 <i </—1.Thus/ = k, and the factorization of n into k primes is unique up 

to the order of the factors. | 
EXERCISES 


1. Prove Theorem 2.7.4: Suppose m,n € Z are both odd. Then mn is odd. 
2. Prove that the sum of two even integers is even. 

3. Prove that the sum of two odd integers is even. 

4. Prove that the sum of an even and an odd integer is odd. 


5. This exercise finishes the proof of the division algorithm, Theorem 2.7.6. Suppose 
a,béZ,anda > 0. 


(a) Suppose b < 0. Use the result already demonstrated on —b > 0 to prove the 
existence of g,r € Z such that b = ag +rand0<r <a 


(b) Now that you know b = aq +r, where 0 < r < a is possible for all a, b € Z 
with a > 0, show the uniqueness of q and r.*° 


6. Prove Theorem 2.7.7: Suppose x, y € Zandn € N. Thenx =, y ifand onlyifx and 
y have the same remainder when divided by n according to the division algorithm. 


7. Suppose a and b are integers such that a =3 b #3 0. Show that ab =; 1.*° 


34Write —b = aq, +r, and use two cases, r} = 0 and 0 <r; < a, to find the desired g andO <r <a 
for b. 

35 Suppose qi, G2, 11,12 € Zare such that b = aq; +r; (0< rn, < a) andb = aq) +17, (0 <1 <a). 
If you suppose gq; # q2, then you may assume without any loss of generality that gz > g;. How does 
this produce a contradiction? 

36From Theorem 2.7.7, a and b will have the same nonzero remainder upon division by 3. Consider 
both cases. 
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8. Prove Theorem 2.7.8: If a € Z\{0}, then a | a. 
9. Prove Theorem 2.7.9: If a,b,c € Zsuch that a | band b | c, thena | c. 
10. Prove Theorem 2.7.10: Ifa | b anda | c, then for all m,n € Z,a | (mb +nc). 


11. Prove Theorem 2.7.11: Ifa | b and b | a, thena = +b.*7 
12. Use induction to show that 3 | (4” — 1) for alln EN. 
13. Show that ifn € Z is odd, then 8 | (n? — 1).*8 


14. Show that if a,b,c € Z are all odd, then there are no rational solutions x to the 
equation ax” + bx +c = 0.°° 


15. Prove the following parts of Theorem 2.7.13. 


(a) If g = moa + nob is the smallest element of S as defined in Eq. (2.84), then 
g | a. (The proof that g | b is identical.) 


(b) Ifh is any positive integer with the properties that h | a and h | b, then it must 
be that h | g. 


(c) If g; and g> both have properties DI-D2, then g) = g»."! 

16. Construct a parallel to Definition 2.7.12 of the term gcd(ay, ao, ..., d,). State and 
prove a parallel to Theorem 2.7.13.” 

17. Prove Theorem 2.7.15: Ifa, b € Zand p | ab, then either p | a or p | b.* 


18. Prove Theorem 2.7.17: Suppose aj, d2,...,d, € Zand p € N is prime. Suppose 
also that p | aja) ---d,. Then there exists some k (1 < k <n) such that p | a. 


19. Suppose p € N is prime and a € Z. Show that a* =, 1 ifand only ifa =, +1. 


2.8 Roots and irrational numbers 


In this section, we return to the real numbers to investigate a few more of its properties. 
First, we investigate ¥/x for n € N and x ER. This provides a perfect opportunity for 
a presentation of one of the most famous theorems of all time: the fact that /2 is not 
rational. Alas, you will provide the proof. 


37See Exercise 10f from Section 2.3. 

381¢ k © Z, then either k or k + 1 is even. 

3°If x = p/q isa rational solution, then you may assume p and q are not both even. Multiply the 
equation through by g and apply earlier results of this section to show that ap2 + bpq + cq? is always 
odd. 

4°If g Ja, then the division algorithm produces a contradiction by generating an element of S that is 
smaller than g. 

41 Theorem 2.7.11 should come in handy. 

#?Relax. Induction won't be necessary. The definition and steps of the proof can be done using 
universal and existential quantifiers. 

8 See Exercise 3g from Section 1.2 for the umpteenth time. 
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2.8.1 Roots of real numbers 

[| 

In Section 0.2, assumption A22 states that for every real number x > O andn €N, there 
is a real number y solution to the equation y” = x. That is, nth roots of positive real 
numbers exist in R. In this section, we address the existence and possible uniqueness of 
solutions to y” = x for all x € R. We'll not make any claims that you are not already 
familiar with, but we do need to prove them. Most of the proof is left to you in Exercise 1, 
with a thorough outline provided. 

You might be wondering why assumption A22 uses the language of equation solv- 
ing (y” = x) to address roots of real numbers and only makes slight reference to the 
expression x/x. To do otherwise is to put the cart before the horse. We begin by letting 
the expression %/x mean any solution y to the equation y” = x. To say y = x/x is to say 
y” = x. However, there could be a problem because such a solution y € R might not 
exist ifx < 0. Furthermore, possible confusion could result because there might be more 
than one such y. In this case, we need to decide which solution of y” = x we will declare 
to be the unambiguous value of ./x. By the end of this section, then, the expression /x 
will be clearly and uniquely defined for appropriate x ¢ Randn € N. 

Another point to make here is that assumption A22 is not a standard axiomatic 
assumption in a rigorous study of the real numbers. It actually follows from the Least 
Upper Bound axiom. In this text, we accept it without question. Here is the main theorem 
concerning roots of real numbers. 


Theorem 2.8.1. Letn € N. Then the following hold: 
(X1) The equation y" = 0 has the unique solution y = 0. 
(X2) Concerning even roots: 


(a) Ifx > 0, the equation y*" = x has precisely two distinct solutions in R, and 
these are additive inverses of each other. 


(b) Ifx <0, the equation y?” = x has no solution inR. 


(X3) Concerning odd roots, the equation y"*! 


= x has a unique solution for everyx € R. 
Part X3 of Theorem 2.8.1 overlooks the case n = 1, which is acceptable because it’s 
trivial. Property X1 should be clear, for certainly 0” = 0, and if y” = 0, then y = 0 from 
the principle of zero products. Thus there is no other value of ~/0. See Exercise 1 for the 
completion of claims X2—X3. 
With Theorem 2.8.1, we can now introduce the notation %/x and define it unam- 
biguously. 


Definition 2.8.2. Ifm € N, then *"X/x is defined to be the unique solution y of the equa- 
tion tt = x. |If x > O, the expression %/x is defined to be the unique, nonnegative 
solution y of the equation y~” = x. 


Theorem 2.8.1 and Definition 2.8.2 lead us to the following principles of algebraic 
manipulation: For x > 0, yr = x if and only if y = + X/x. Similarly, for all x € R, 
yt! — x ifand only ify = *4Yx. 


2.8 Roots and irrational numbers 


2.8.2 Existence of irrational numbers 

Le 

Up to this point, we have said virtually nothing about irrational numbers, that is, real 
numbers that are not rational. You've worked with many numbers that are irrational, 
including zr, e, and most numbers of the form ¥/x. You've also been taught that irrational 
numbers have a decimal representation that does not terminate and does not fall into a 
pattern of repetition. However, you might never have seen a definitive argument that any 
particular number is irrational. Here we address this. But first, a little history. 

Some of the intuitive properties of R that we likely take for granted were not a part 
of the thinking of the ancient Greeks, most notably the Pythagoreans, that very secret 
order of thinkers who produced some amazingly sophisticated mathematics. One such 
intuitive notion we probably have is that the set of real numbers is a sort of continuum, a 
set that can be visualized in terms of the ordered points on a straight line, where rationals 
and irrationals are spread up and down like strewn grains of salt and pepper, leaving no 
holes in the continuum. The Greeks had no concept of irrational numbers at first, and 
now is a good time to touch on some of their views. Here is a very brief description of 
what they meant. 

To the Greeks, numbers were best visualized in terms of lengths of segments, areas of 
rectangles, and volumes of rectangular solids, all of which were constructible according 
to certain rules. They had very sophisticated techniques for constructing them using 
the only two geometric shapes they considered perfect, straight line segments (crudely 
constructible with a straight-edge) and circles (crudely constructible with a compass). In 
Euclid’s all-important compilation of the best of Greek mathematics (a five-volume set 
called Elements), he demonstrated amazingly sophisticated mathematical results, such 
as the theorem attributed to Pythagoras, using only some assumptions about circles and 
line segments. 

With these techniques of construction, it is possible to imagine in the following way 
the drawing, if you will, of a line segment whose length is any positive rational number. 
Beginning with a line segment whose length is arbitrarily declared to be one unit, it is 
possible to construct segments representing 2, 3, 4, etc. by extending the given segment 
with a straight-edge, then twirling a compass around to tack the measured unit length 
onto itself end to end. It is also fairly easy to take a segment of length n and use similar 
triangles to construct a line segment of length 1/n. (See Fig. 2.3.) With another technique 
by which segments of length a and b can be used to construct a segment of length ab, all 
the positive rational lengths are constructible. (See Exercise 2.) Easy enough, right? 

The fact that segments of arbitrarily long length, say, n € N, could be constructed 
meant that segments of arbitrarily short length 1/n could also be constructed by the 
reciprocation technique. Such a short line segment could then, in theory, be added to 
itself as many times as needed to produce a segment of any arbitrary length m/n. The 
Greeks erroneously assumed the converse—that a segment of any constructible length 
could be constructed by imagining it to be a finite sum of very short segments of the 
form 1/n. Slap any segment down onto an imaginary piece of paper using techniques of 
construction. How long is it? It is some whole multiple of a (possibly very short) segment 
that can be constructed by reciprocating a segment of lengthn € N. In modern language, 
what number is represented by the length of a segment? Always a rational one. 

Now suppose we are given two segments of arbitrary lengths. If both of them can be 
visualized in this way, what sort of relationship must exist between these two segments? In 
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Construct segment AB of length n. 
Construct segment AC of length 1. 
Construct segment BC. 

Construct segment FE parallel to BC. 
Segment AE has length I/n. 


i 


A 1 F Or asiteas n 


Figure 2.3 Constructing length 1/7 using similar 
triangles. 


the same way that we would find a common denominator of two fractions m/n and p/q, 
our ancient mathematicians would say that there is a single segment, perhaps very short, 
that can be attached to itself a finite number of times to produce each of the two given 
segments. In our language, 1/nq can be added to itself mq times to produce m/n and np 
times to produce p/q. The term that is used to describe this assumed relationship between 
two segments is that they are commensurable. The Greeks believed that all constructible 
segments are commensurable, which is equivalent to our believing that all real numbers 
are rational. 

There was a problem with this and the Greeks eventually figured it out. Take a 
segment of length one, then construct another segment of length one at one endpoint 
and ata right angle to this first segment. Then sketch the hypotenuse. This hypotenuse line 
segment is obviously constructible. We just described how to do it easily. Furthermore, 
by the Pythagorean theorem, its length / satisfies /? = 2. The problem is that it is not 
commensurable with some other constructible segments. That is, / is not rational. This is 
the amazing result that the Pythagoreans discovered, and it created no small crisis. Given 
that the theorem traditionally named after Pythagoras was already known, it boiled down 
to this fact: Since “the sum of the [areas of the] squares on the legs of a right triangle 
is equal to the [area of the] square on the hypotenuse” (see Fig. 2.4), then, as we would 
write it, /2 is a constructible length. Therefore, /2 is commensurable with 1, or, in our 
language, 2 must be writable in the form p/q, where p,q € Z. The traditional proof 
that this is impossible is attributed to Aristotle. And, by the way, this is the one you'll 
discover in Exercise 3, perhaps with a few hints. 


Theorem 2.8.3. /2 is irrational. 


A few words are in order about irrational numbers. Let’s work backwards from 
Theorem 2.8.3. Theorem 2.8.3 says 2 is not rational, while Theorem 2.8.1 says /2 
is real. Thus, irrational numbers do exist in R. However, Theorem 2.8.1 is based on 
assumption A22, which we have said very little about. As we said, assumption A22 is 
not an axiom of the real numbers. It can be proved from the Least Upper Bound axiom, 
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Figure 2.4 Asketch of the Pythagorean 
theorem. 


which is a standard axiom of R. What we discover is that the irrational numbers owe 
their existence to the Least Upper Bound axiom. 


EXERCISES 


1. This exercise leads you through the proofs of claims X2—X3 from Theorem 2.8.1. We 
begin with assumption A22, assuming that for every x > 0 andn €N, the equation 
y” = x has some solution y € R. 


(a) Existence in X2(a): Letn € Nandx > 0.Suppose yo € Risasolution to y™" = x. 
(Notice yo 4 0.) Show that —yo is also a solution to y2” = x.4 


(b) Nonexistence in X2(b): Letn € Nand x < 0. Explain why there is no y € Rsuch 


that y2” = x. 
(c) Existence in X3 for x < 0:Letn € Nand x < 0. Prove that there exists a solution 
ye Rto ile _ x45 


(d) Nonexistence in X3 of solutions of opposite sign: Let n € N, and let yo be a 
solution to y*"*! = x. Prove that x > 0 if and only if yo > 0. 

(e) Uniqueness in X2(a) and X3 (positive case): If y) > O and yz > 0 satisfy y/ = x 
and y} = x, then yj = a aba 


(f) Uniqueness in X3 (negative case): If y) < 0 and y2 < 0 satisfy yo = x and 
ye = x, then y, = y).47 


2. Devise a technique similar to that described in Fig. 2.3 by which a segment of length 
ab can be constructed from segments of length a and b. 


“4Exercise 8 from Section 2.4 should come in handy. 

45Use the fact that —x > 0. 

46Use the factorization formula in Exercise 11 from Section 2.4. 
47 Apply part (e) to —y; and —yp. 
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3. Prove Theorem 2.8.3: V2 is irrational. (Naturally, you'll want to assume it is rational 
and arrive at a contradiction. Here are some hints if you need them.**:#?:>°.>!) 


4. The proof of Theorem 2.8.3 can be easily generalized to demonstrate that ./p is 
irrational for any prime p € N. Explain how your proof of Theorem 2.8.3 can be 
adapted into a proof of this more general claim.” 


5. Suppose a,b,c,d € Qanda+ b/2 = c+ dvV2. Does it follow that a = c and 
b = d?? 


6. Prove that the sum ofa rational and an irrational must be irrational.™ 
7. Prove that the product of a nonzero rational and an irrational must be irrational. 


8. Suppose x, y > 0. Show that ifx < y, then /x < //y.> 


— 2.9 Relations in General 


There is another way to construct an equivalence relation that starts at a place different 
from that in Definition 2.5.1. A relation is a set construction that puts all kinds of element 
comparisons such as equality, less than (<), divisibility, and subset inclusion into one 
mathematical idea. It’s also a way of linking elements of two different sets together, and 
is a context in which functions can be defined. In this section, we'll define a relation and 
look at examples and special kinds of relations. First, however, we define the Cartesian 
product of two sets A and B by 


Ax B={(a,b):a€A,be B}, (2.86) 


the set of ordered pairs where the first term in the pair is an element of A and the second 
is an element of B. You might have seen the Cartesian plane written as R x R, or more 
succinctly as R*. A relation from A to B is defined to be any subset of A x B. 


Example 2.9.1. For A = {1, 2,3} and B = {11, 12, 13, 21, 22, 23, 31, 32, 33}, the set 
R=({d, 11), @, 21), (2, 22), (3, 31), G, 32), GB, 33)} (2.87) 
is a relation from A to B. 


If a relation involves subsets of R, we can represent it graphically as a set of points in 
the xy-plane, just as in high school algebra. 


48write /2 =m /n, where you can assume no common factors between m and n. 
49 Square both sides and look at the contrapositive of Corollary 2.7.5. 

50Tf m2 is even then, .... What does this mean? Cancel out a 2. 

510 n? is even. What contradiction does this cause? 

*2Corollary 2.7.16 might help. 

S31f b  d, then what must be true of /2? 

54See Exercise 3h from Section 1.2. 

Exercise 9i from Section 2.3 should make this quick. 
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Example 2.9.2. {(x, y) : y > |x| + 1} isa relation from R to R* (Fig. 2.5). 
y 


Figure 2.5 Therelationy > |x| +1. 


In this section, we’re going to delve only into subsets of A x A, which we calla relation 
on A instead of a relation from A to A. 
Example 2.9.3. Let A = {1, 2, 3,4, 5, 6}. Then 

R= {(1, 3), (, 5), 2, 4), @, 6), 3, 5), (4, 6} (2.88) 

is a relation on A. 

You might think there’s nothing particularly exciting about a relation, for any col- 
lection of ordered element pairs qualifies as one. What makes the idea take shape and 
become mathematically important is that we can lay out certain criteria by which pairs 


are included in the relation, and these different criteria might have particular properties 
that make interesting statements about A. 


Example 2.9.4. Define a relation R C Q x Qby R = {(p/q,r/s) : ps = qr}. This 
relation consists of all pairs of rational numbers that are equal, as we defined equality in 
Eq. (2.75). Thus, (3/8, —30/—80), (0/2, 0/12) € R, while (2/5, 5/3) ¢ R. 
Example 2.9.5. R = {(x, y): x < y} defines a relation on R (Exercise 1). 
Example 2.9.6. R; = {(a, b) : a < b} and R2 = {(a, b) : a | b} are relations on Z. 
Example 2.9.7. Foraset A,R =@ and R = A x A are relations on A. 

The power set of a given set A is defined to be the family of all subsets of A. For 
example, if A = {1, 2, 3}, the power set of A is 

{O, {1}, {2}, {3}, (1, 2}, {1, 3}, (2, 3}, {1, 2, 3}. (2.89) 


We're saving a discussion of the number of elements in a set until Chapter 3, but notice 
that A has three elements and the power set of A has eight elements. This motivates the 
notation 24 to mean the power set of A. 
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Example 2.9.8. For A = {1, 2, 3}, definearelationon 24 by R = {(A1, Az) : Ay © Ap}. 
Thus ({1}, {1}), @, A) € R, but (A, {2}), ({1, 2}, B) ER. 


In Examples 2.9.4—2.9.8, a particular pair (x, y) from the set is included in the relation 
ifsome stated property P(x, y) is true. That is, (x, y) € Rifx and y are related according 
to some criterion. We give names to some relations when the criterion for inclusion of 
ordered pairs in R has certain properties. Here’s an example. 


Definition 2.9.9. Suppose R C A x A has the following properties: 


(€1) (x, x) € Rforallx € A (Reflexive property); 
(€2) If (x, y) € R, then(y,x) € R_ (Symmetric property); 
(€3) If (x, y), (yz) € R, then (x, z) € R_ (Transitive property). 


Then R is called an equivalence relation on A. 


Notice how E1-E3 say the same things here as E1-E3 from Definition 2.5.1, but in 
different language. Instead of saying that x = x for all x € A is a true statement, we say 
that all ordered pairs of the form (x, x) are in the relation. Instead of saying that the truth 
of x = y implies the truth of y = x, now we say that the inclusion of the ordered pair 
(x, y) in the relation is always accompanied by the inclusion of (y, x). And instead of 
saying x = y and y = zimplies x = z, we say that the presence of (x, y) and (y, z) inthe 
relation is always accompanied by the presence of (x, z). Thus an equivalence relation 
on A is a subset of A x A with some special properties that motivate a partition of A. 

Any time we define a term that describes how elements of a set may or may not 
compare to each other, we can use that basis of comparison as a criterion for inclusion ina 
relation. For example, the symbol < denotes a way that two real numbers relate or compare 
to each other. In Example 2.9.5, we used the statement x < y asacriterion for inclusion in 
arelation. The relation defined by < has some special properties that motivate a definition 
of another type of relation. Since it’s common to write xRy to mean (x, y) € R, and say 
that x is related to y instead of saying (x, y) is an element of the relation, we'll use this 
somewhat more efficient notation in the next definition and example. 


Definition 2.9.10. Suppose R is a relation on a set A with the following properties: 


(01) xRx forallx € A (Reflexive property); 
(02) IfxRy and yRx, thenx = y_ (Antisymmetric property); 
(03) IfxRy and yRz, thenxRz (Transitive property). 


Then R is called an order relation on A, or a partial ordering of A. 


Example 2.9.11. Show that the relation in Example 2.9.5 is an order relation on R. 


2.9 Relations in General 


Solution: We show that R has properties O1-O3. 
(Ol) Since x = x for all x € R, we have that x < x, so that xR x for allx € R. 
(O2) Suppose xRy and yRx. Then x < yand y < x,sothatx = y. 


(O03) Suppose xRy and yRz. Then x < yand y < z. Nowif either x = y or y = z, 
then x < z bysubstitution. If, on the other hand, x < yandy < z,thenx < z 
by Exercise 9 from Section 2.3. In either case, xR z. 


Since R has properties O1—-O3, it defines an order relation on R. | 


We can be even more efficient with our language and notation than we were in 
Example 2.9.11 by dispensing with the statement xRy and saying simply that < defines 
an order relation on R. That would make the preceding verification look like this: 


Solution: We show that < has properties O1—-O3. 
(Ol) Clearly x < x forallx € R. 
(O2) Ifx < yandy <x, thenx = y. 


(O03) Suppose x < y and y < z. Ifeither x = y or y = z, then clearly x < z. If, on 
the other hand, x < y and y < z, thenx < z from Exercise 9 in Section 2.3. 
In either case, x < z. 


Since < has properties O1—O3, it defines an order relation on R. | 


Verifying the claim in the next example will be quick if you reference the applicable 
results from Section 2.1 (Exercise 3). 


Example 2.9.12. If A is a set, then C defines an order relation on 24. 


For the set {1, 2, 3}, the order relation defined by € can be illustrated with a directed 
graph as in Fig. 2.6. The arrow from {1} to {1, 2} indicates that the former is related to 


1 day t 


JAIN 


{1,2} {1,3} {2,3} 


><) ><] 


{1} {2} {3} 


{3 


Figure 2.6 Digraph of the partial 
order relation C on 2'!:2-3}. 
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the latter by C. Notice that {1, 2, 3} is reachable from {1} by a directed path by way of 
either {1, 2} or {1, 3}, and either of these directed paths indicates that {1} € {1, 2, 3}. 

You'll verify the claim of the next example in Exercise 4, then in Exercise 5 you'll 
sketch a directed graph that illustrates divisibility on a given subset of Z. 


Example 2.9.13. Divisibility defines an order relation on Z. 


Another type of relation is reminiscent of <. 


Definition 2.9.14. Suppose R is a relation ona set A with the following properties: 
(S1) xRx forallx € A (Irreflexive property); 

(S2) IfxRy, then yRx (Asymmetric property); 

(S3) IfxRy and yRz, thenxRz (Transitive property). 


Then R is called astrict order relation on A, or a strict ordering of A. 


Example 2.9.15. Show that < defines a strict order relation on Z. 


Solution: We show that < satisfies properties S1-S3. 
(S1) Since x — x = 0, we have that x — x £0,sox x forall x € Z. 
(S2) Suppose x < y. Then y — x > 0,so that y—x £0. Thus y 4 x. 


(S3) From Exercise 9 from Section 2.3, ifx < y and y < z, thenx < z. ] 


To this point we made no more than a passing reference in Chapter 0 to strict subset 
inclusion C. Saying that A; C Az creates a compound statement. First, if x € Aj, then 
x € Ao. As well, there exists x € Az such that x ¢ A). In Exercise 6, you'll show that C 
defines a strict order relation on 24. 

One notable difference between < and divisibility on Z is that any pair of integers 
we choose is comparable in either one way or the other according to <. That is, for 
alla, b € Z, eithera < b orb < a. For divisibility, however, there are many pairs of 
integers that are not comparable at all. For example, neither 6| 10 nor 10 | 6 is true. This 
distinction motivates yet another type of relation. 


Definition 2.9.16. Suppose R is a relation on a set A with the following properties: 
(T1) xRx forallx € A (Reflexive property); 

(12) IfxRy and yRx,thenx = y  (Antisymmetric property); 

(13) IfxRy and yRz, thenxRz_ (Transitive property); 

(T4) Forallx, y € A, either xRy or yRx. 


Then R is called a total order relation on A, or a total ordering of A. 


2.9 Relations in General 


Notice that properties T1-T3 are the same as O1—O3, so a total order relation is a 
special kind of order relation where every pair of elements is comparable in one way or 
another. 


Example 2.9.17. On R, the relation < is a total ordering. 


Example 2.9.18. For A = {1, 2, 3}, C does not define a total order of 24, for {1,2} Z 
{2, 3} and {2, 3} Z {1, 2}. 


One of our assumptions from Chapter 0 is the WOP of W. To say that a is the smallest 
element of anonempty S C W isto say that a < x forall x € S. The last type of relation 
we define is a generalization of the WOP of W. 


Definition 2.9.19. Suppose R is arelation ona set A with the following properties: 


(W1) xRx forallx € A (Reflexive property); 
(W2) IfxRy and yRx, thenx = y_ (Antisymmetric property); 
(W3) IfxRy and yRz, thenxRz (Transitive property); 


(W4) If Sis any nonempty subset of A, then there exists a € S suchthat aRx forallx € S. 


Then R is called a well order on A, or a well ordering of A. 


Properties W1—W3 are the same as O1—O3, so a well ordering of A is an order relation. 
Property W4 adds the feature that every nonempty subset of A contains what we might call 
a least element. The antisymmetric property implies that such a least element is unique 
(Exercise 8). Youll show in Exercise 9 that W4 implies T4, so that a well ordering is a 
total ordering. However, a total ordering is not necessarily a well ordering. For example, 
< is a total ordering of Z that is not a well ordering, as is illustrated by Z itself. Ifa € Z 
were a least element of Z, then a — 1 is an integer for which a < a — 1 is false. Since W4 
implies T4, but not vice versa, property W4 is stronger than T4. 

Just because a given set with an order relation fails to be a well ordering, it does not 
mean that the set cannot be well ordered by some other relation. In fact, one of the most 
notable results in modern set theory is that any set can be well ordered. Ernst Zermelo 
demonstrated this in 1904, using an axiom of set theory that we have said nothing about 
so far in this text. The axiom of choice is a somewhat mysterious axiom of set theory that 
is simple to state, but not often addressed at this level of the mathematical game, at least 
not without some caveats and near apologies. It says “Given any family F of mutually 
disjoint nonempty sets, there is a set S that contains a single element from each set in F” 
Thus S can be thought of as the result of having chosen a representative element from 
each set in J. In the axiom of choice there is enough strength to demonstrate that for 
any set, there exists a relation A that is a well ordering. One way to see how Z can be well 
ordered is to list its elements as (0, —1, 1, —2, 2, —3, 3, ...) and define xRy if x does not 
come after y in this listing. Ordering Z in this way, every nonempty subset has a least 
element. 
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EXERCISES 


1. For each of the relations on R, sketch the set of included points in the xy plane: 
(a) {@, y):x < y} 
(b) {a y) ix? +y <4} 
(c) {@,y) = lal < 
(d) {@, y):x7+y? > 1} 
2. List all elements of the power set of the following sets: 
(a) @ 
(b) {1} 
(c) {1, 2} 
(d) {1, 2,3, 4} 


3. Show that for any set A, C defines an order relation on 24. 
4. Show that divisibility defines an order relation on Z. 


5. Sketch a directed graph to illustrate the order relation of divisibility on {1, 2, 3, 4, 6, 8, 
9, 12, 18, 24, 27, 36, 54}. 


6. Show that for any set A, C defines a strict order relation on 24. 


7. Fora,b € Z, writea <z b to meana < bas in Example 2.9.15. For p/q,r/s € Q 
where 0 <z g and 0 <z s, define p/q <q r/s if ps <z qr. Use the fact that <z isa 
strict order relation on Z to show that <g is a strict order relation on Q. 


8. Show that the least element of a well-ordered set is unique. 


9. Show that a well ordering is a total ordering by showing that property W4 implies T4. 


Functions 


3.1 Definitions and Terminology 


CHAPTER 


5 


Second only to sets, functions are likely the most important mathematical concept. In 
reality, functions can be defined solely in terms of sets, and some authors take that 
approach. So it’s arguable that sets are the heart of all mathematics. In this section, 
we define the term function and study some examples, and then we expand on some 
important terminology. 


3.1.1 Definition and examples 


Definition 3.1.1. Given two nonempty sets A and B, a function f is a rule, or set of in- 
structions, by which each element of A is paired with exactly one element of B. A is called the 
domain, and is denoted Dom f. B is called the codomain. Notationally, if f isa function from 
Ato B,wewrite f : A > B. Ifa € Ais paired by f withb € B, wewrite f(a) = bor 
a +> b.We say that D is the image of a, or that a maps to b, and a is apre-image of b. The 
subset of B consisting of the images of all elements of A is called the range, and is denoted 
Rng f. (See Fig. 3.1 for a basic sketch.) 


Another way to define a function is in terms of a relation, that is, a subset of A x B. 
Instead of imagining elements of A being associated with elements of B via the input- 
output imagery of Definition 3.1.1, it’s possible to define a function as a set of pairs (a, b), 
where a € A andb € B, and with some additional restrictions we'll mention in what 
follows. We also use the term mapping to refer to any pairing of the elements of two sets. 
Thus a function is also a mapping with some special restrictions. 
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Figure 3.1 Schematic of afunction: f : A > B. 


Let’s present some clarifying thoughts and examples to give us a handle on Defini- 
tion 3.1.1. First, fora mapping f : A > B to bea function, everya € A must have an 
image b € B. That is: 


(Fl) For everya € A, there exists b € B such that f(a) = b. 


In addition to property F1, the rule defining f must produce a unique f(a) for all 
a € A. Mathematically, we say that f is well defined if f(a) is unique for everya € A. 
How can we say mathematically that the image of a € A must be unique? (See Fig. 3.2 
for a sketch of what must not happen.) One way to say that f is well defined is to say 


(F2) If by, b2 € B are such that f(a) = b; and f(a) = bo, then bj = bp. 


Figure 3.2 The image of a is not unique. 


3.1 Definitions and Terminology 


At first glance, F2 sounds like a statement that would be true for any mapping. After 
all, how could b, ever be different from bz when all you have to do is apply the transitive 
property of equality to the two equations f(a) = b; and f(a) = bo? Unfortunately, 
this way of expressing the uniqueness of f(a) doesn’t reveal possible problems that can 
prevent f from being well defined. Here are two examples that illustrate how a mapping 
can fail to be well defined. 


Example 3.1.2. It’s possible that the rule defining f : A > B can be ambiguous. For 
example, let f : R* U {0} > R be defined in the following way. For x € R* U {0}, 
define f (x) to be a solution y to the equation y* = x. You cannot doubt that this is a set 
of instructions for generating f(x) from x, that the generated f(x) is in R, and that it 
works for every x in the domain. However, the rule is ambiguous for every x # 0. Letting 
x = 25, y) = 5, and y. = —5, we have demonstrated the existence of y,, y2 € IR where 


fF) = i> f(@) = 2 and yy x 2. 


Example 3.1.2 illustrates a possible pitfall you might run into with the notation f(x) 
if you haven't verified that f is well defined. Unless you know that f is well defined, you 
might find yourself using an expression like f (25) as one name for more than one thing. 
Make sure that the set of instructions detailing how f(x) is to be determined doesn’t 
produce more than one value for any x in the domain. 

Here’s another example of howa mapping can fail to be well defined, this time because 
a single element in the domain might have more than one distinct name. 


Example 3.1.3. Define f :Q — Zin the following way. For p/q € Q, define f(p/q) = 
p. Thatis, the image ofa rational number written in a standard form of integer over integer 
is defined to be the numerator. Unlike Example 3.1.2, there is no ambiguity concerning 
what f(p/q) is. The problem here is that the standard definition of equality in the 
rationals (=g, as discussed in Section 2.6.1) lumps a lot of different expressions of the 
form p/q into the same equivalence class. As a specific example, although 2/5 = 6/15, 
f(/5) # f (6/15). The fact that one domain element can be addressed by more than 
one name causes problems, because the image of x € Q depends on the form it’s written 
in. Thus f is not well defined because we have exhibited a € Q and b,, by € Z where 
f@ =h, f(@ = bo, but by F do. 


Example 3.1.3 illustrates a good way to restate property F2. Ifa; and a2 are two names 
for the same thing, that is if aj =a», then the rule must produce the same functional 
value for a, and a. That is, f(a) = f (az). 


(F2) Ifaj,a2 € Aand a, = ay, then f(a,) = f(a). 


In the language of relations, properties Fl and F2 translate to the following. If a 
relation R C A x B isa function, property F1 says that, for all a € A, there must exist 
b € B such that (a, b) € R. Property F2 says that if (a, b,), (a, bz) € R, then by = bp. 

Suppose someone gives us two sets A and B, anda rule f for pairing elements of A 
with elements of B. If we are asked to verify that f is a function, then we must verify that 
F1 and F2 hold. 
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Example 3.1.4. Show that f : R\{1} > R defined by 


f@)= 


x—1 
is a function. 


Solution: 


(Fl) Pick any x € R\{1}. Then by properties A3 and A9 (closure in R of addition 
and multiplication, respectively), x? + 2 and x — 1 exist in R. Furthermore, 
because x # 1, then x — 1 ¥ 0, so that (x — 1)~! € R by property A13 
(existence of multiplicative inverses). Finally, (x? + 2) - (x — 1)7! € R by 
closure of multiplication. Thus f(x) exists in R. 


(F2) Pick x1,x2 € R\{1} and suppose x; = x2. Then by properties A2 and A8 
(addition and multiplication in R are well defined), it follows that x? + 2 = 
x3 +2 and x; — 1 = x2 — 1. By the uniqueness of multiplicative inverses, 
(x;-1)7! = (x.—-1)7!. Finally, applying property A8 again, (x?+2)/(x1-1) = 
(x¥ + 2)/(x2 — 1), so that f (x1) = f (x2). a 


Example 3.1.4 illustrates a broad result that is fairly easy to see using only the example. 
If f is a rule that can be written in the form of polynomial over polynomial, that is, if 


AnX" + ay_yx" | +---+ayx +a 


= ; (3.1) 
Dinx™ + Dmx"! +--+ + bx + bo 


wherea;,b; € R(1 <i <n,1 < j < m), then, assuming we avoid values of x that make 
the denominator zero, f isa (well-defined) function from an appropriate subset of R to R. 
Here’s another way to say it. Since + and x are well defined on R, and additive and multi- 
plicative inverses are unique, then any mapping whose rule is built up from the operations 
of +, —, x, and +, and whose domain avoids any occurrence of 0~! will be well defined. 

We can go even further. Thanks to Theorem 2.8.1 and Definition 2.8.2, we have 
ensured that “/x is well defined for all x € R if n is odd, and for all x > 0 ifn is even. 
Therefore, we can say that any real-valued mapping whose rule is built up algebraically 
with +, —, x, +, 0", and x/_, and whose domain is a subset of R that avoids 0~! and 
even roots of negative numbers will be well defined. For example, 


1+ ./x3 — 1/( — 3) 


x —1+x>5 


fayastrey (3.2) 
is a (well-defined) function whose domain is some hideous subset of R. Functions built 
up as is f in Eq. (3.2) are called algebraic. 

There is a simple function that will come in handy in Section 3.3. In its simplicity it 
illustrates the sorts of things we have to show when verifying that a mapping is a function 
and when showing a function has certain properties. 


Example 3.1.5. Let n € N and m €é Z, and write N, = {1,2,...,n} and N” = 
{l+m,2+m,...,n+m)}. Define a translation mapping T : N, > Ni” by T(x) = 
x +m. Show T isa function. 


3.1 Definitions and Terminology 


Solution: We show that T satisfies properties F1 and F2. 


(Fl) Pick x € N,. Since 1 < x < n, it follows that 1+ m <x+m<n+m, or 
l+m <T(x) <n+m. Thus T(x) € NV, 


(F2) Pickx1, x2 € N, and suppose x; = x2. Then since addition in R is well defined, 
T(x}) =x) +m =x. +m = T (x2). Thus T is well defined on N,. | 


3.1.2 Other terminology and notation 
Le 
There is a wealth of terminology surrounding functions, and we present some of it 
now. As you read the informal introductions to the terms, you should try to construct 
formal definitions yourself for practice. After you've tried to do it yourself, then read the 
definitions provided. 

First we ask what meaning we would like to assign to the statement f = g. The 
standard definition is the following. 


Definition 3.1.6. Two functions f and g are said to be equal if they have the same domain 
and codomain, and f(a) = g(a) foralla inthe domain. 


It’s a quick mental exercise to see that Definition 3.1.6 is an equivalence relation. For 
example, to show E1, we note that f has the same domain and codomain as itself, and 
f(a) = f(@) for all a in the domain. Showing E2 and E3 is equally easy. 

If f : A > B isa function, we would like to create some notation to denote the 
image, not of a single element, but of a subset of the domain A; C A. The notation 
we use is f(A1). Clearly, it is a subset of the range. How should we define f(A) for 
some A, C A? To put it another way, under what conditions is y € f(A1)? Here’s the 
definition: 


Definition 3.1.7. Suppose f : A — Bisa functionand A; C A. Then the image of Aj, 
denoted f (Aj), is defined by 


f(A1) = {ty € B: (ax € Ail)(y = f(x))}. (3.3) 
Thus y € f (Aj) if and only if there exists x € Ay such that y = f(x). 


With the notation of Definition 3.1.7, we can then rigorously define the range of 
f: A> Bby 


Rng f = f(A) = ty € B: Gx € A)(y = f@))}. (3.4) 


Property F1 says that a function f must map every element of A to some element 
of B. Property F2 says that the path from a€ A to f(a) € B must not have a fork in 
the road, as was illustrated in Fig. 3.2. Two other terms relate to the possibility that a 
function f : A — B might have analogous features when viewed, as it were, in the 
reverse direction. Analogous to every a € A having an image, perhaps every b € B has 
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_- 1 fa) =f@) 


— 
—_— 


Figure 3.3 This function is not one-to-one. 


a pre-image. Also, analogous to the uniqueness of the image of a € A, perhaps every 
b € Rng f has a unique pre-image. A function with the former feature is called onto. 
A function with the latter feature is called one-to-one. In a one-to-one function, the 
forbidden merging of the path from different elements of A to the same image in B is 
illustrated in Fig. 3.3. 

So how shall we define the terms onto and one-to-one? Think about it before you read 
the definitions that follow. 


Definition 3.1.8. Suppose f : A — Bisa function. Then f is said to be onto provided 
for every b € B, there existsa € A such that f(a) = b. Thatis, 


fisonto & Vy € B)@x € A)(y= f(x)). (3.5) 


. ; onto oe ae 
As anotational shorthand, wewrite f : A —> B. Anonto function is also called a surjection. 


Example 3.1.9. Show that f : R > R defined by f(x) = x? — 1 isa surjection. 


Solution: Memories from pre-calculus resurface, and we recall that the graph of a 
polynomial function of odd degree extends all the way up and down the y-axis, so 
that everything in the codomain R has a pre-image. But how do we prove f is onto? 
We have to choose an arbitrary y € R and work backwards to find some x € R that 
maps to it. If y € R is chosen arbitrarily, what does x need to be so that y = f(x) 
will be true? Assuming you found it, here’s a cleaned up proof. 


3.1 Definitions and Terminology 


Pick y € R. Let x = \/y + I. Then x is, in fact, a real number (Theorem 2.8.1), 
and 


fax) = fWy+D=(Vyt1P-l=y+i-l=y. (3.6) 


Thus f is onto. a 


Example 3.1.10. Show that the function T from Example 3.1.5 is a surjection. 


Solution: Pick y € N’”. Letx = y —m. Since l1+m < y <n+™m, we have that 
1 <x <n,sothat x € N,. Also, T(x) = x +m = y. Thus T is onto. a 


Before you read the definition of one-to-one, think about uniqueness of a pre-image 
and try to come up with your own definition. 


Definition 3.1.11. Suppose f : A — B isa function. Then f is said to be one-to-one 
provided [ f(a,) = f(a2)] > [a; = ap]. Or, if you prefer, 


f is one-to-one & (Wa,, a2 € A)([f (a1) = f(m)] > [a, = a)). (3.7) 


; ; I-I Sane: 
As a notational shorthand, we write f : A —> B.A one-to-one function is also called an 
injection. 


Example 3.1.12. Show that the function T from Example 3.1.5 is an injection. 


Solution: Pick x;,x2 € N, and suppose that T(x) = T(x2). Then x1 +m = 
x2 +m, so that x; = x2 by cancellation. Thus T is one-to-one. a 


: . 1-1 
If f : A > Bisboth one-to-one and onto, we may write f : A—> B.A one-to-one, 
on 


onto function is also called a bijection. A bijection from A to B is said to put the sets A 
and B into a one-to-one correspondence. 

Perhaps the simplest example of a one-to-one correspondence is the identity function 
i: A — A defined by i(a) = a for alla € A. The need for this ho-hum function arises 
more often than you might think, and the fact that it is a one-to-one, onto function from 
a set A to itself must be demonstrated. You'll do that in Exercise 5. 


3.1.3 Three important theorems 

a 

The following theorems are not the prettiest ones you'll ever see in your mathematical life, 
but they’re not the ugliest either. They're not complicated in principle, but proving them 
requires slavish attention to some rather minute details. If nothing else, they illustrate 
the inescapable fact that laying the groundwork for later, more elegant results sometimes 
requires you to muddle your way through preliminary theorems whose proofs are sticky 
and might involve multiple cases. You'll prove the first of these theorems in Exercise 10. 
It comes in handy in proving the other two. 
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Theorem 3.1.13. Suppose fi : Ai zy B, and fy : Az aees By are functions, and that 
onto onto 
A, Az = BLN By = GW. Define f : Ay U Az > By U By by 


_ ffi), ifx € Ay; 
oes oe ifx € Ap. 


Then f is a one-to-one, onto function from A, U A> to By U Bo. 


Theorem 3.1.13 says that if youre given two one-to-one, onto functions defined from 
disjoint domains to disjoint codomains, then it’s possible to paste together the domains, 
paste together the codomains, and then define a single function between these new sets 
that is also one-to-one and onto. 

We'll prove the next theorem here, for it’s a little sticky and will prove to be a good 
warm-up for proving the third. 


Theorem 3.1.14. Letn€N, and let SCN, be any nonempty set. Then there exists a 
natural numberm <n and some f : S > Ni» where f is a one-to-one, onto function. 


Proof: We prove by induction onn > 1. 


(11) Ifn=1, then S=N, = {1}. Letting m=1 we may define f:S —> N,, by 
fC) = 1, which is clearly a one-to-one, onto function. 


(12) Suppose n > 1 and that the result is true for any nonempty subset of N,,. Let $ 

be any nonempty subset of N,,;;, and consider the following three cases. 

If S = N,+41, then we may let m =n-+ 1 and f : S > N,, be the identity 
function. 

IfS AN,+1 andn+1 ¢ S, then S C N,. Thus, inductive assumption 
applies, and there exists m <n anda function f : S = Nn- 

If S A N,+1 andn+ 1 € S, then there exists ke é N, such thatk ¢ S. 
Let T = SU {k}\{n + 1}. (Draw a picture!) Then T C N,, and the inductive 
assumption applies to yield m < n and a function f : T =, Nm. Define 


g:S—> Nn by 


a= {fo ifx € T\{k}; ae 


fk), ifx=n+l. 

Defining A; = T\{k}, B) = {n + 1}, Az = Nn\{f(k)}, and Bo = {f(k)}, we 
may apply Theorem 3.1.13 to conclude that g : S — N,, is one-to-one and 
onto. 2 

The last theorem is a sort of nonexistence theorem whose importance will become 


clear in Section 3.3. You'll prove it in Exercise 11. 


Theorem 3.1.15. Supposem,n € Nand m <n. Then thereis no function f : Ny = Nin: 


3.1 Definitions and Terminology 


EXERCISES 


1. What does it mean to say that a function is not onto? 
2. What does it mean to say that a function is not one-to-one? 
3. Here are several pairs of sets. 
(a) {1, 2, 3, 4} and {a, b, c} 
(b) {1, 2,3, 4} and N 
(c) Zand Z 
(d) ZandN 
(e) RandR 
For each of the preceding pairs of sets, find four functions { fi, fo, 3, f4} from the 
first set to the second set with the following properties, if such functions are possible. 
(i) f; is one-to-one but not onto. 
(ii) fo is onto but not one-to-one. 
(iii) jf; is both one-to-one and onto. 


(iv) f4 is neither one-to-one nor onto. 
Be prepared to provide as much explanation as necessary to support your claim. 
4. Prove that the function f : R > R defined by f(x) = x? — 1 is one-to-one. 


5. Show that the identity mapping i : A — A is a one-to-one function from A # J 
onto itself. 


6. Suppose f : A > Bisa function and A), Az C A. Prove the following, or disprove 
it by providing a counterexample. If the claim is not true, is at least one direction of 
subset inclusion true? 


(a) f(A1N Az) = f(A1) 9 f(A2).! 
(b) f(A1 U Ag) = f(A1) U (Ag). 
7. From the parts of Exercise 6 that are valid, state analogous theorems for a family of 


sets F = {A}. 


8. This exercise is a return to Exercise 6a, which we hope you have already solved. 
Suppose f : A + B. Show that f(AL ON Ao) = f(A) A f (Az). 

9. In this exercise we want to consider functions f : S — R where S C Rand f is of 
the form 

ax+b 

cx +d’ 


where c and d are not both zero. Clearly since f is algebraic, it is well defined on 
whatever subset of R prevents cx + d = 0. 


f@= (3.9) 


'See Exercise 8 if you have to. 
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(a) Under what conditions (i.e., restrictions on a, b, c, d) will f be one-to-one?” 
(b) Given that the restriction you discovered in part (a) holds, f is a one-to-one 
function. Consider the two cases c = Oandc # 0. 
i. Show that if c = 0, then f is defined on all of R and is onto. 
ii. Show that if c 4 0, then there exist x9, yp € R such that f is defined on 
R\{xo} and f is onto R\{yo}. 
10. Prove Theorem 3.1.13: Suppose fi : Aj => B, and fo : Ao =, B> are functions, 
and that A; N Az = B, N By = Y. Define f : Ay U Az > By U By by 


_ J fia), ifx € Ai; 
FO) = ees ifx € Ap. 


Then f is a one-to-one, onto function from A, U A to By U Bo. 
11. Prove Theorem 3.1.15 by induction on m > 1 in the following way: 
(a) Show that there is no such function for the case m = 1. 
1-1 


(b) Use the contrapositive to prove the inductive step: i ie exists f : N,+1) — 
onto 


Nin+1 for some 1 < m <n, then there exists g : N,, = Nn 


—— 3.2 Composition and Inverse Functions 


One way to think ofa function is as a linking ofa set A to a set B with certain restrictions. 
If A is linked to B bya function f, and B is linked to C by a function g, then we might 
want to look into how to link A to C via B, using f and g in succession. Composition 
is the term we use when combining functions in this way. It might remind you a little 
of the transitive property, whereby some relationship between a and b, in conjunction 
with a relationship between b and c, yields the same relationship between a and c. Then, 
reminiscent of the symmetric property, whereby a = b implies b = a, we study the 
possibility of using f : A > B to link B back to A by reversing the function f. Such a 
reverse linking is called the inverse of f. 


3.2.1 Composition of functions 

————— = | 

Suppose f : A — Band g: B — C are functions. We want to define a new mapping 
h: A — C that uses the rules of f and g together as in Fig. 3.4. 


?If the statement f(x;) = f (x2) > x1 = x2 is going to be a true statement, what must be assumed 
about a, b, c, d to make it so? Play with the statement f(x;) = f(x). 

31f f(n + 1) =m +1, then defining g should be easy. Otherwise, f(n + 1) =k forsome 1 <k <m 
and f(J) =m +1 for some 1 </ <n. Define g to map/ tok. 


3.2 Composition and Inverse Functions 


Figure 3.4 Composition: (go f): A> C. 


Definition3.2.1. If f: A — Bandg: B — Care functions, we define the composition 
go f : A— C tobe the mapping defined by (g o f)(a) = g(f(a)) foralla € A. 


Example 3.2.2. Consider the functions f : R > R* U {0} defined by f(x) = x? and 
g : Rt U {0} > Rt U {0} defined by g(x) = /x. Then (go f)(x) = Vx?. This is 
another way to define |x|. 


Notice we did not presume to say that go f is a function in Definition 3.2.1. We need 
to use the fact that f and g are themselves functions to show that properties Fl and F2 
hold for g o f (Exercise 1). 

Composition should not be new to you. You used it extensively in calculus, and the 
chain rule is the technique you used to differentiate a function that was composed of 
other functions. The questions we want to address here involve relationships between 
f, g, and go f based on their individual characteristics. For example, if f and g are 
one-to-one, does it follow that go f is one-to-one? What about onto? What if you know 
something about f and g o f? Can you conclude anything about g? 

Given f, g, and g o f, and given the characteristics of one-to-one and onto, let’s 
put them together in all possible combinations and consider the following six possible 
theorems. Given f: A> Bandg:B—>C: 


(Q1) If f and g are onto, then g o f is onto. 
(Q2) If f and g o f are onto, then g is onto. 
(Q3) If g and g o f are onto, then f is onto. 


(Q4) If f and g are one-to-one, then go f is one-to-one. 
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(Q5) If f and g o f are one-to-one, then g is one-to-one. 


(Q6) If g and go f are one-to-one, then f is one-to-one. 


Some of the questions Q1—Q6 are true, and some are false. In fact, some might be 
true even if one of the hypothesis conditions is omitted. To try to prove or disprove them, 
a picture might be helpful. Counterexamples can be constructed easily if you let simple 
pictures with simple sets inspire them. We'll prove that Q4 is true and then you'll attack 
the rest in Exercise 2. 


Theorem 3.2.3. If f : A > Bandg: B > C are one-to-one functions, then go f : 
A — C is one-to-one. 


Proof: Suppose f and g are one-to-one, and suppose (g 0 f)(a,) = (go f)(a). 
Thus g[ f (a1)] = gL f (a)]. Since g is one-to-one, we have f (a,) = f (az). onan 
since f is one-to-one, it follows that a; = a. Thus go f is one-to-one. 


3.2.2 Inverse functions 

ss 

If f: A— B is a function, then the rule linking A to B is well defined on all of A. 
We now ask a question from the perspective of B. Given f:A-— B, can we find a 
function g:B — A whose rule, in effect, is the exact reversal of the rule for f? That 
is, if f(8) = —2, then we want g(—2) = 8. It might occur to you right away that 
f will need to be a certain kind of function in order for the g derived from it to be 
a function itself. For example, what feature will f need to have in order for g to be 
defined on all of B? The answer: f must be onto. What feature will f need in order 
for g to be well defined? The answer: f must be one-to-one. Such a function g, if one 
exists, is called an inverse of f. Here are a definition and a theorem addressing its unique 
existence. 


Definition 3.2.4. Suppose f : A — Bisa function. Then we say that a function g: B > 
A is aninverse of f if(g o f)(a) = aforalla € Aand(f o g)(b) = bforallb € B. 
Such a function g is generally denoted f—!. 


Theorem 3.2.5. Given a function f : A—> B, there exists a unique inverse function 
1-1 


fo! :B — A. 


onto 


The proof of this theorem has several parts to it. We'll get it started here, though 
youll supply the details in Exercise 4. To define the rule for g : B > A, we must pick 
b € B and explain how to find some a € A that we’re going to call g(b). So for b € B, 
define g(b) to be any solution a to the equation f(a) = b. Showing g has property F1 is 
to show that the equation f(a) = b has some solution in A. Showing g has property F2 
is to show that f(a) = b has a unique solution. And youre on your way. 

In Section 3.1, we exploited the notation of functions to discuss the image of a set 
f(A1), where A, © A. Similarly, we can slightly abuse the notation f~! to talk about 
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the pre-image of a set B, C B. Even though the function f~! might not exist because 
f might fail to be one-to-one or onto, we can still talk about all elements of the domain 
that map to the elements of By. 


Definition 3.2.6. Given function f : A — B,and By C B, we define the set 
f—'(B), the pre-image of B, by 


f-'(B1) ={ae€ A: f@ € Bi}. (3.10) 


Thus x € f~!(B,) if and only if f(x) € By. If By = {b} has only one element, we generally 
write f—!(b) instead of f—!({b}). 


The notation f~!(b) instead of f—!({b}) for a pre-image set is technically inaccurate, 
but it’s a common abuse of notation. If f is one-to-one, so that f—! exists as a function, 
then f~—!(b) is simply the element of A that maps to b. In this case, we don’t generally 
think of f~!(b) as a pre-image subset of the domain that contains this one element. 
However, if nothing is known about the existence of f~! as a function, then f~!(b) 
should be thought of as the set of all domain elements that map to b by f. Naturally this 
set could be empty, or contain any number of elements. 


Example 3.2.7. Consider f:R — R defined by f(x)=x*. Determine f~!(4), 
f~'({0, 9]), and f~!([—2, —1]). 


Solution: The first one is easy, for f—!(4) = {+2}. Without going into any detail for 
the others, we know that 0 < f(x) <9 ifand only if x € [—3, 3]. Thus f~1([0, 9]) = 
[—3, 3]. Also, since f(x) > 0 forall x € R, f([-2, —1)) =. a 


Given A; C A and B, C B, we might want to address the following statements: 
fC (AD) = At, (3.11) 
f(f-1(B)) = Bi. (3.12) 


It turns out that neither of these statements is true, but in each case, one direction 
of subset inclusion is true. Example 3.2.7 can suggest counterexamples to demonstrate 
how subset inclusion fails in the other direction. See Exercise 7. 


EXERCISES 


1. Given functions f : A>B and g : BC, show that go f : A>C from Defini- 
tion 3.2.1 is a function. 


2. Prove or disprove the remaining statements from Q1—Q6 on pages 107-108. Of those 
that are true, which have hypothesis conditions that can be relaxed? 


3. For the false statements in Q1—Q6, replace one hypothesis condition with another 
that will make the statement true. Prove. 
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4. Prove Theorem 3.2.5 in the following way: Let f : A na B be given. Define a mapping 
g : BA by letting g(b) be any solution a to the equation f(a) = b. Show g isa 
one-to-one, onto function from B to A with the following steps: 

(a) Show that g is defined on all of B (F1). 

(b) Show that g is well defined (F2). 

(c) Show that g is onto. 

(d) Show that g is one-to-one. 

(e) Show that (g o f)(a) =a foralla € A and (f 0 g)(b) = b forallb € B. 


(f) Show that g is unique: Suppose g; and g> are two functions from B to A that are 
one-to-one, onto and satisfy (g, o f)(a) = (g2 0 f)(a) = a for alla € A and 
(f o g1)(b) = Ff © g2)(b) = Bb for all b € B. Show that g; = go by showing 
g1(b) = go(b) forall b € BA 


5. For non-empty sets A and B, define A = B if there exists a function f : A aay 3 


onto 
Show that = defines an equivalence relation on the family of all non-empty sets. 


6. Suppose f : A > B isa function. Prove the following: 
(a) If Ay © Ap © A, then (Ay) © f (Ad). 
(b) If B) © By C B, then f~'(By) C f~'(By). 
7. Suppose f : A > Bisa function, Aj C A and B; C B. 
(a) Prove the following: 
i, Ar S f'(f(A1)). 
ii, f(f~'(Bi)) © Bi. 


(b) Provide examples to show that the reverse subset inclusions from part (a) are 
false. 


(c) Determine condition(s) under which the reverse subset inclusions from part (a) 
would be true. Prove your claims. 


8. Let f : A— B bea function and B, C B. Show f-!(Bi) =[f (Bp. 


—— 5.3 Cardinality of Sets 


What would you have in mind by saying that a set has n elements? Or given two sets, 
what would you mean by saying they are the same size, that is, they have the same number 
of elements ? The term we use to denote the number of elements in a set is cardinality. 
The question of cardinality is a bit stickier than you might imagine at first. Consider the 
following sets: 


A= {-5,-2,1,4,7,10} and B= {b,d,h, p,f, lj. (3.13) 


4All you need are (f 0 g1)(b) = (f 0 g2)(b) = b and that f is one-to-one. 


3.3 Cardinality of Sets 


What would you say is the cardinality of A? Would you be inclined to say that A and B 
have the same cardinality? What about 


CSA OO A and! Dis te 10 50, S10 2 


What would you say is the cardinality of C? Would you say C and D have the same 
cardinality? What about Z and Q? Q and R? Ifa set is finite in cardinality, whatever that 
might mean, then saying it has cardinality n ought to be a pretty straightforward term 
to define. Also, if two sets are finite, then saying they have the same cardinality ought to 
suggest a pretty natural relationship between them. However, when sets are not finite, 
things get a little more complicated, as we'll see. 


3.5.1 Finite sets 

EE 

When you looked at A and B in Eq. (3.13), you probably counted six elements of A, 
then used that as a basis for saying A has cardinality six. Doing the same for B, you then 
said A and B have the same cardinality. To count elements in this way is precisely the 
motivation behind the following definition. 


Definition 3.3.1. Let A be a set, and suppose there exists m € N such that A can be 


placed into one-to-one correspondence with N, = {1,2,3,...,m}. That is, there exists 

n € Nandsome f : N, arn A. Then we say that A is a finite set, and that it has cardinality 
onto 

n, which we write |A| = 7. For the empty set, we make a special definition, writing No = @, 


and we define the empty mapping f : No — @ in order to say that @ is finite and has 
cardinality zero. 


Defining f : No > @ as an empty mapping does not really jibe with Definition 3.1.1, 
for in our definition of function, we required that domains and codomains be nonempty. 
However, there’s no reason we cannot extend the definition of function to include this 
case, as long as we make sure that properties F1 and F2 apply to it. And they do. In fact, 
the empty mapping meets all the requirements for being a one-to-one, onto function 
because all the requirements are statements that involve the universal quantifier. For 
example, the empty mapping is onto, for if it were not, then there would exist some 
y € & which has no pre-image. But no such y exists, so the empty mapping is onto. 
This illustrates a strange way that a statement involving the universal quantifier (or if- 
then) can be true. Statements of the form (Vx € J)(P(x)) are always true. For example, 
every human being on Mars has three legs. If a statement is written in if-then form, and 
if the hypothesis condition cannot ever be true, then regardless of the conclusion, the 
implication statement is true. 


Example 3.3.2. Show that the set B in Eq. (3.13) has cardinality 6. 


Solution: Define f : Ne — B in the following way: Let f(1) = b; f(2) = d; 
fGB) =h; f4 =p; f) =f; f() = 1. Since f is one-to-one and onto B, we’ve 
shown that |B| = 6. | 
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Example 3.3.3. By the identity mapping, |N,,| = 7. 


Definition 3.3.1 introduces two terms concerning the size of some, but not all sets: 
finiteness and cardinality. Regardless of whether a set A is empty, to say that it is finite is 
to mean that it has some n € W associated with it, called its cardinality, which derives 
from the existence of some function f : N, — A. Conversely, to say that |A| = n, where 
n € W,, is to say that A is finite. One other thing to notice is that @ is the only set with 
cardinality zero. For if A 4 J, then no f : No > A can be onto. If A is not finite, then 


F ‘ er : . 1-1 
we can still talk about its cardinality, but not in terms of functions f : N, — A. 
nto 


With Example 3.3.2, we have earned the right to make a statement such as “B has six 
elements,” and validated our inclination to walk our fingers over the elements ofa finite set 
and count them, so to speak. We must be careful, though. The existence of f : N,, = A 
is a basis for declaring the cardinality of A to be n. But how do we know that there is 
not some other m#n for which there exists g : Nin as4 A? If there were such an m, 
then cardinality would not be well defined, for | A| could be two different numbers. With 
some of the theorems from Sections 3.1 and 3.2, your proof of the following theorem in 
Exercise 1 should be quick. 


Theorem 3.3.4. The cardinality of a finite set A is well defined. That is, if|A| = m and 
|A| =n, thenm =n. 


Exercise 5 from Section 3.2 allows us to say that the family of all sets is partitioned into 
equivalence classes based on the existence of one-to-one, onto functions between them. 
Now we can see what some of these equivalence classes are. Definition 3.3.1 effectively 
defines cardinality for some sets based on whether they are in the equivalence class of N, 
for some n € W. Furthermore, by Theorem 3.3.4, that n is unique. So ifm # n, then N,, 
and N,, are representative elements of different equivalence classes. With this, we have 
proved the following. 


Theorem 3.3.5. Suppose A is a finite set, and B is any set. Then A and B have the same 
cardinality if and only if there exists a bijection from A to B. 


The proof of Theorem 3.3.6 will require you to call on a lot of the results from 
Sections 3.1 and 3.2. Just remember that all results must be justified in terms of the 
existence of certain one-to-one, onto functions. 


Theorem 3.3.6. If|A| =m, |B| =n, and AN B =Q, then|AU B| =m+n. 


Theorem 3.3.6 says that if there exist fj : Nin faa Aand fp : Nw REN B, then you 
onto onto 


I-l ; nee . , 
can construct some g : Nin+n —> AU B. You'll do this in Exercise 2. Once you’ve defined 
onto 


g, showing it is a one-to-one, onto function from N,,,,, to A U B should amount to little 
more than calling on previous exercises you've done. You'll prove the next two theorems 
in Exercises 3 and 5. 


Theorem 3.3.7. If|B| =n and A C B, then A is finite and satisfies |A| <n. 
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Corollary 3.3.8. Suppose |A| =n and C is any set. Then|ANC| < |A|. 


Proof: Since ANC C A, the result is immediate from Theorem 3.3.7. | 


Theorem 3.3.9. If|A| =m and|B| =n, then|AU B| <m+n. 


3.3.2 Infinite sets 


The following definition shouldn’t be too surprising. 


Definition 3.3.10. If a set is not finite, it is said to be infinite. 


In terms of the definition of finite set, what is characteristic of an infinite set?> A set 
A is infinite if, for every n € W and every f : N, — A, either f is not one-to-one, or 


ti 
is not onto. In reality, if there Wate some f : N, ““ A that is not one-to- -one, we 
y: 


could create a function fi : Nin — A for some m < n that culls out any repetition in 
f. Here’s how. First create a subset S CN, so that f : S — A is one-to-one in the 
following way. For each a € A, take a single element of f~'(a), and let S consist of 
all these chosen pre-images.° Then apply Theorem 3.1.14 to conclude the existence of 


1 


I-1 : . 
g: S—N,, forsomem <n. Then f; = f og isa one-to-one, onto function from 
onto 


N,, to A. The upshot is that a set A is infinite if for everyn € Nand f : N, — A, f is 
not onto. Loosely speaking, there are not enough elements in N,, for anyn € W to tag all 
the elements of A. No matter how you might try to count them exhaustively using only 
the elements of N,, for some n, you'll never count them all. 

Before we get into the interesting results of infinite sets, let’s point out where it will 
lead. Strangely, just because two sets are both infinite, it does not follow that they have the 
same cardinality, in the sense that there is a one-to-one correspondence between them. 
Some infinite sets are actually bigger than others. This makes for some real surprises, 
and motivates us to discuss different orders of infinity. In fact, it’s possible to generate an 
infinite sequence of infinite sets Ai, Az, ..., where |A,,| is a higher order of infinity than 
|A,—1|. It's mind boggling. Not only is there more than one size of infinity, but there are 
infinitely many infinities. We'll look at only two. Here’s our first infinity. 


Definition 3.3.11. uppees 7 is an infinite set. Then A is said to be countably infinite if 
there exists a function f : N = A, and we say that A has cardinality No (the Hebrew letter 


onto 


aleph). \f A is finite or countably infinite, we say that A is countable. If A is not countable, 
we say that it is uncountable. 


By calling on the identity mapping i : N — N, we see that N is countably infinite, and 
the family of all countably infinite sets is precisely the equivalence class of N from the 
equivalence definition in Exercise 5 from Section 3.2. Let’s look at some other countably 
infinite sets and some theorems about them. 


>Negate Definition 3.3.1. 
®See the discussion of the axiom of choice on page 95. 
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Theorem 3.3.12. Z is countable. 


Youll prove Theorem 3.3.12 in Exercise 7. Your task is to construct a function f : 
N =, Z by declaring f (1), f(2), fG), etc., Once you’ve developed such a function, see 
onto 


if you can come up with an explicit formula for f(n). Specifically, you might find a way 
to define the one-to-one correspondence as 


? ifn is even, 


f@ = L if n is odd, 


and then apply Exercise 10 from Section 3.1. 

If we expect to stumble eventually onto an uncountable set, the rationals might seem 
to be the first one we would find. After all, in Exercise 12 from Section 2.8, we showed that 
a < (a+b)/2 < b, so that between any two rational numbers there is another rational 
number. Thus the rational numbers are strewn densely up and down the real number 
line, as opposed to the integers, which have plenty of room in between each one. Well, 
Q is countable. Theorem 3.3.13 starts with only the positive rationals and says that there 
are no more positive rationals than there are natural numbers. Even though N c Qt, 
and even though rational numbers are densely scattered up and down the real line, it is 
possible to list elements of Q* sequentially as f(1), f (2), f (3), etc. to create a function 
f:N = Q*. Here’s how. 


Theorem 3.3.13. Q* is countable. 


Proof: Consider Fig. 3.5, which lists all elements of Qt. From this figure, we may 
define f in the following way. Starting in the upper left-hand corner of the table, 
define f(1) = 1/1. We then move down successive diagonals from upper right to 


Figure 3.5 Ordering of Q* to show countability. 


3.3 Cardinality of Sets 


lower left, defining f(2) = 1/2 and f(3) = 2/1 from the first diagonal. Moving 
down the next diagonal, we define f(4) = 1/3. But since 2/2 = 1/1, we skip 2/2 
and define f (5) = 3/1. Continuing in this fashion, 


1 2 3 4 1 5 
SOV Ge Ima PB See SO aa PAO Ses. Ee 


and so on, making sure f is one-to-one by skipping over rational numbers that are 
not in reduced form. This program guarantees that f is one-to-one and onto, so we 
have shown that Q* is countable. | 


Let’s pause here for a moment and reflect on the proof of Theorem 3.3.13. We can 
think of finding a one-to-one correspondence f : N — A as sequencing the elements of 
A as (a1, a2, a3, ...) where every element of A is in the list exactly once. If such a listing 
of elements of A can be found, then you've shown A is countably infinite. 

In tracing through the grid in the proof of Theorem 3.3.13, we ensured that f : 
N + Q? is one-to-one by skipping over entries that were not in reduced form. If we 
had not done this skipping, the resulting f would still have been onto, but not one-to- 
one. Knowing a priori that Q? is infinite, this failure of f to be one-to-one would not 
be particularly disconcerting. Loosely speaking, here’s why. If there are enough natural 
numbers to tag all elements of Q with some repetition, then there ought to be enough to 
tag them without repetition. The next theorem gives us the freedom not to worry about 
this repetition, and will save us from some minor headaches later. 


Theorem 3. 14. A nonempty set A is countable if and only if there exists a function 
fs N27 onto 


Proof: 


(=) Suppose A is nonempty and countable. Then by definition either a 2 finite or 
countably infinite. If A is countably infinite, then there exists f : N => A, and 
clearly there exists an onto fincion. So suppose A is finite. Then there exists 
n € Nanda function f : N, — A. We may extend the domain of f to all of 
N and map every k > n + | to any element of A we choose. Thus f : N— A 
is onto. 


Suppose there exists f : N <S A,and suppose A is not finite. (See Exercise 3g 
from Section 1.2.) The plan is to consider the sequence 


FQ), FZ), FG), fH, --- (3.14) 


and select a subsequence that we can write as 


g(1), (2), 83), 8A), ..., 


where we effectively remove repetition from (3.14). 

Let g(1) = f(1). Let g(2) be the first term in (3.14) different from g(1). 
Let g(3) be the first term of (3.14) different from g(1) and g(2), and so on. 
The fact that A is not finite means that a new, different term from (3.14) can 
always be found. We claim that the resulting g : N — A is one-to-one and 
onto. Here’s why. 


Ya 


(<= 
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Pick anya € A. Because f is onto, there exists n € N such that f(n) = a. 
Thus S = {n € N: f(n) = a} is a nonempty subset of N. By the WOP, S$ 
contains a smallest element k. According to the program by which g was con- 
structed, f(k) is chosen to be g(j) for some j < k. Thus g is onto. Also, g is 
one-to-one because k is the unique smallest element of S and all other f(n) 
for n > k are skipped over in the defining of g. 7 


The convenience of Theorem 3.3.14 will become apparent in the following theorems. 
For example, given two countable sets, Theorem 3.3.15 allows you to show that the union 
is countable by constructing a mapping f : N ““ AU B without having to worry 
whether A and B are disjoint, or whether they are finite or countably infinite. The proofs 
of Theorems 3.3.15—3.3.17 are left as exercises. Since Theorem 3.3.14 does not apply to 
the empty set, it can only be invoked in proving Theorems 3.3.15—3.3.17 if we know that 
all sets involved are nonempty. But since the union across a family of sets is unchanged if 
all empty sets in the family are omitted, we’re free to assume that all sets in the following 
theorems are nonempty. 


Theorem 3.3.15. If A and B are countable, then so is AU B. 
Theorem 3.3.16. If{Ax}{_, is a finite family of countable sets, then Uj_, Ax is countable. 
Theorem 3.3.17. If{A,}°°, is a countably infinite family of countable sets, then U, An 


is countable. 


By Theorem 3.3.16, it follows that Q is countable, for Q = Q* UQ U {0}, and Q- 
is countable in the same way that Q* is. 

Ifevery f : N > A fails to be onto, then A is uncountable. This is how we prove the 
following: 


Theorem 3.3.18. [0, 1] is uncountable. 
Proof: Suppose f : N — [0, 1] is any function whatsoever. We can show that f 
fails to be onto. From assumption A21, we may think of [0, 1] as the set of all decimal 
representations of the form 0.XXXXX. ... Now consider the following diagram: 
fC) = 0.a;a2a3a4... 
f(2) = O.bibobsb4... 
Ff (3) = 0.c1020€3¢4... 
F(A) = 0.didodzdq... 
f (5) = 0.e1e2€3e4... 


We can show f is not onto by constructing a real number x ¢€ [0, 1] that is not in 
the list. Let x = 0.x) x0x3x4... where x; 4 a1, X2 4 bo, x3 # 3, etc. and no x, = 9. 


3.3 Cardinality of Sets 


Then x € [0, 1], and x is different from every number in the range of f. Thus [0, 1] 
is not countable. a 


Youll prove Theorem 3.3.19 in Exercise 11. 


Theorem 3.3.19. If B is countable and A C B, then A is countable. 


Corollary 3.3.20. R is uncountable. 


Proof: Since [0, 1] is uncountable and [0, 1] Cc R, Theorem 3.3.19 implies that R is 


uncountable. r] 
EXERCISES 
1. Prove Theorem 3.3.4: If |A| = m and |A| =n, thenm =n.’ 
2. Prove Theorem 3.3.6: If|A| = m,|B| = n,and ANB = @, then|A U B| =m+n8 
3. Prove Theorem 3.3.7: If |B| =n and A C B, then A is finite and satisfies |A| < n.? 
4. Suppose |A| = |B| = n, and suppose f : A *\ Bis any function. Show that f 
must be onto.!° 
5. Prove Theorem 3.3.9: If |A| = mand |B| =n, then|AU B] < m+n"! 
6. Prove that the union of a finite number of finite sets is finite.!* 
7. Prove Theorem 3.3.12: Z is countable. 
8. Prove Theorem 3.3.15: If A and B are countable, then so is A U B. 
9. Prove Theorem 3.3.16: If {A,}7_, is a finite family of countable sets, then U?_, Ax is 
countable. 
10. Prove Theorem 3.3.17: If {A,}°°, is a countably infinite family of countable sets, 
then U%, A, is countable.!? 
11. Prove Theorem 3.3.19: If B is countable and A C B, then A is countable.!* 
12. Prove that the irrationals are uncountable. 


7Take care of asa special case. Otherwise, if the f is false, it violates Theorem 3.1.15. 
8Make use of Theorem 3.1.13 where the domain of h is Nn and the range is A U B. The function 
T : N, — N}’ from Example 3.1.5 should be helpful, too. 


lf f : Ny 4 B, apply Theorem 3.1.14 to f~!(A). Construct g : Nn 24 by composition. 


!0What if f is not onto? By theorem 3.3.5, |A| = |Rng f|. 

'I See Exercise 7 from Section 2.1. 

Use induction and Theorem 3.3.9. 

Induction wort help here. Try a grid like that in the proof of Theorem 3.3.13. 
'4Use Theorem 3.3.14 to take care of A # . 
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13. Leta < bandc < d. Using interval notation [a, b] = {x : a < x < b}, show that 


I[a, b]| = Ife, d]| by finding a function f : [a, b] 4 [c, d]. 


=—— 3.4 Counting Methods and the Binomial Theorem 


How many ways are there to put together a meal from all the cafeteria offerings? To 
arrange your CD collection on your dresser? To name a baby boy from a list of family 
preferences? To choose a steering committee for your club? In this section, we’re going 
to apply the cardinality results from Section 3.3 to answer these and other questions. 
We'll develop and apply results to finite, nonempty sets and functions that relate them, 
though some results will be more generally applicable. Because the main goal of this 
section is to gain exposure to some complex cardinality questions whose answers are 
really quite plausible and have very practical application, we'll sometimes talk through 
informal arguments instead of providing or requiring rigorous proof. 

To simplify some of the language and notation, if A is a finite set and f: N, > A 
is a one-to-one, onto function, we'll address elements of A as {a,, d2,...,d,}, where 
a, = f (k) for 1 < k <n. Since f is a function, every k € N,, can be used to reference a 
unique a; € A. Since f is one-to-one, all the a, are different, and since f is onto, every 
element of A can be addressed as a, for some k € N,. 

In Section 2.9, we defined the Cartesian product of two sets A and B as 


Ax B={(a,b):a€A,be B}, (3.15) 


the set of ordered pairs whose first coordinate comes from A, and whose second coordi- 
nate comes from B. Assuming A and B each have a form of equality defined on them, 
which we temporarily write as =, and =z, we can define equality in A x B by 


(a, by) =AxB (ao, by) if and only if a; =A 42 and b, =B bo. 


Assuming =, and =, are equivalence relations, we can show that =,4,g defines an 
equivalence relation. For property El, if we pick (a,b) € A x B, then a =, a and 
b =, b, so that (a,b) =x (a,b). Showing properties E2 and E3 is equally easy 
(Exercise 1). From now on we won't bother with different notations for equality in the 
different sets, instead simply writing that (a;, b}) = (dy, b2) if and only if a; = a2 and 
by = bo. 


3.4.1 The product rule 


Our first theorem will serve you as the root of an induction argument when you prove 
Theorem 3.4.4. 


Theorem 3.4.1. Suppose |A| = 1 and|B| =n > 1. Then|B| = |A x BI. 


We'll prove Theorem 3.4.1 by demonstrating a function f : B A x B to show 
onto 
that they’re in the same cardinality class. 


3.4 Counting Methods and the Binomial Theorem 


Proof: Write A= {a} and B= {b,, bo,..., by}. Define f: B— A x B by f(by) = 
(a, by). We show f is a one-to-one function from B onto A x B. First, for any 
by € B, (a, by) € A x B, so f has property F1. Also, ifb; = by, then f(bj) = 
(a, bj) = (a, be) = f (bx), so that f has property F2. If we suppose f(b;) = f (bx); 
then (a, bj) = (a, by), which implies bj = by, and f is one-to-one. Finally, if we 
pick any (a, by) € A x B, then by € Band f (bg) = (a, bx), so that f is onto. Since 
we have found a one-to-one correspondence between B and A x B, they are in the 
same cardinality class, and |B| = |A x Bl. | 


Here’s a theorem you'll prove in Exercise 2, and then an immediate corollary. 


Theorem 3.4.2. If A,, Az, and B are sets, then 
(A, UA>) x B= (A, x B)U (A> x B). (3.16) 
Furthermore, if Ay ON Ar = @, then (A, x B)M (Az x B) = 9. 
Corollary 3.4.3. IfA,, Ao, and B are finite setsand Aj Az = @, then|(A; U Az) x BJ = 
|Ai x B)|+|A2 x BI. 
Proof: By Theorems 3.3.6 and 3.4.2, 
|(A, U As) x B] = |(Ay x B)U (Ad x B)| = 1A, x BI + | Ao x BI. (3.17) 
| 


With Theorem 3.4.1 and Corollary 3.4.3, you have all you need to write an induction 
argument for the following (Exercise 3): 


Theorem 3.4.4. Suppose A and B are nonempty, finite sets. Then 
|A x B| = |A| x |B]. (3.18) 


We can form the Cartesian product of more than two sets, but not without giving 
ourselves a freedom that would make a rigorous set theorist a bit uncomfortable. We 
want A x B x C to be the set of ordered triples (a, b,c) where a € A, b € B, and 
c € C. However, to define A x B x C in terms of the binary Cartesian product defined 
in Eq. (3.15), we need to associate either A and B, or B and C. This leaves us with 
(A x B) x C or A x (B x C), which, unfortunately, are different. As we defined the 
Cartesian product in Eq. (3.15), 


(A x B)x C= {((a,b),c):a€ A, bE B,c EC}, (3.19) 
but 
Ax (Bx C) ={(a, (b,0)):a€ A,b € B, ce Ch. (3.20) 


Expressions of the form ((a, b), c) and (a, (b, c)) are not the same, though this doesn’t 
create an insurmountable obstacle. Rather than trying to deal with the lack of associativity 
in the Cartesian product of sets, we'll make the following recursive definition, then give 
ourselves a freedom to ignore the associativity question. 
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Definition 3.4.5. Suppose {Ax} is a family of sets indexed by N. Then we define the Carte- 
sian product recursively as 


1 
] [4« = A), (3.21) 
k=1 
n+l n 
[ [4 = (11 a x Angi forn > 1. (3.22) 
k=1 k=1 


With Definition 3.4.5, elements of | [;_, Ax for the first few values of n would look like 
this: 


Ap = {a: ae Aj} 


= 
ll an 
ua 


amie 


Ag = {(a1, dz) : ag € Ax for | <k <2} 


> 
ll 
rol 


(3.23) 
Ax = {((@1, 42), a3) : ay € Ax for 1 < k < 3} 


> 
ll we 
ua 


as 


Ax = {(((a1, a2), 43), a4) : ag € Ay for 1 <k < 4}. 


> 
ll 
na 


Instead of writing elements of [[j_, Ax this way, we'll give ourselves the freedom 
to address elements as (a1, 42, ..., Gn), n-tuples where a, € Ax forall 1 < k <n. With 
this definition in hand, and another induction argument, you can prove the following 
(Exercise 4): 


Theorem 3.4.6. Suppose {Ax}!_, is a finite family of finite sets. Then 


nes 
k=l 


Here are some practical applications of Theorem 3.4.6. 


= I] [Ax]. (3.24) 
k=1 


Example 3.4.7. Your university cafeteria has the following menu for today’s lunch: 


Meats: Meatloaf, Chicken, Fishsticks 
Starchy vegetables: Potatoes, Rice, Corn, Pasta 
Green vegetables: Beans, Broccoli, Salad, Spinach 
Breads: Rolls, Cornbread 
Desserts: Chocolate cake, Pudding 


If you choose one item from each category of the menu, how many different meals could 
you put together? 


Solution: If we let A; be the set of meat offerings, Az the set of starchy vegetables, 
etc. then each potential complete meal is an element of Fes Ax. By Theorem 3.4.6, 
there are3 x 4 x 4 x 2 x 3 = 288 possible meals. a 


3.4 Counting Methods and the Binomial Theorem 


In Example 3.4.8, counting the number of possible meals can be visualized in the 
following way. We have five empty slots to fill on our plate, where the first is to be filled 
with a choice of meat, the second with a choice of starchy vegetable, etc. Furthermore, 
the number of choices available to fill a particular slot is unaffected by the way any of the 
previous slots has been filled, or the way the remaining slots will be filled. Multiplying 
the number of ways to fill each slot illustrates the product rule. If an n-step process is 
such that the kth step can be done in a, ways, and the number of ways each step can be 
done is unaffected by the choice made for any other step, then the total number of ways 
to perform all n steps is []j_, ak. 


Example 3.4.8. Suppose you have five shirts, three pairs of pants, five pairs of socks, 
and two pairs of shoes, all of which match each other. How many different outfits can 
you put together, if an outfit consists of one shirt, one pair of pants, one pair of socks, 
and one pair of shoes? 


Solution: If we let C;, C2, C3, and Cy be the shirts, pants, socks, and shoes, respec- 
tively, then Me , Cx is the set of all distinct outfits. But | [he Cy) =5x3x5x2= 
150, so there are 150 distinct outfits. | 


Now suppose A and B are nonempty sets with |A| = m and |B| = n. We can use 
Theorem 3.4.6 to calculate the number of functions from A to B in the following way. If 
we form the Cartesian product of m copies of B, then any ordered m-tuple from Tj B 
can be thought of as a function f : A — B, where the first coordinate of the m-tuple 
is f (a,), the second coordinate is f (az), etc. Since the cardinality of []j_, B isn, we 
have the following: 


Theorem 3.4.9. If A and B are nonempty sets with |A| = m and |B| = n, then there 
aren" distinct functions from A to B. 


Theorem 3.4.9 motivates a general notation. Regardless of whether A or B is finite, 
we write B4 to mean the set of all functions from A to B. For example, R!-"! is the set 
of all functions from the interval [0, 1] to R. Here’s an illustration of the practical use of 
Theorem 3.4.9. 


Example 3.4.10. Suppose you toss a coin 10 times and observe the sequence of outcomes 
of heads (H) or tails (T). Each possible outcome of the ten tossings can be written as a 10- 
tuple of the form (H,H,H,1T,H,H,T,T,H,T), each of which we can visualize as a function 
f : Nio > {H, T}. There are 2!9 — 1, 024 such functions. 


If a counting problem can be visualized as filling m labeled slots with any of n 
possible objects, where any one of the objects can be used with unlimited repetition, then 
Theorem 3.4.9 implies that there are n” ways to do it. 


Example 3.4.11. A password to your computer account must be precisely 7 alphanu- 
meric characters, that is, a sequence of 7 characters taken from the 26 letters of the 
alphabet and the digits 0-9. Filling 7 ordered slots with any arrangement of these 36 
characters reveals that there are 36’ = 78,364, 164,096 passwords. 
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3.4.2 Permutations 

[| 

Suppose we want to fill m slots with choices from n objects, but repetition of the objects 
is not allowed. Such an arrangement is called a permutation of n objects taken m at a 
time. We can think of such an assignment as a function from an m-element set to an 
n-element set, but the function must be one-to-one. In order for there to exist any such 
function, it must be thatm < n.Soassuming|A| =m <n = |B|, let’s count the number 
of functions from A to B that are one-to-one. To see what the right answer ought to 
be, imagine making choices for f(a,), then f(a), etc. For f (aj), there are n possible 
choices, then n — | choices for f (a2), n — 2 choices for f (a3), etc. If the product rule 
proves to be correct, there will be n(n — 1)(n — 2)---(n — m + 1) functions. We can 
write this as 

n! 

(n—m)! 
Proving there are P(n, m) one-to-one functions from an m-element set to an n-element 


set demonstrates that P(n, m) is the number of permutations of n objects taken m at a 
time. A rigorous proof should be done by induction (Exercise 5). 


P(n,m) =n(n— 1)(n—2)---n-m+1)= (3.25) 


Theorem 3.4.12. Let A and B be nonempty sets with |A| = m < n = |B|. Then there 
are P(n, m) distinct one-to-one functions from A to B. 


Example 3.4.13. The Fitzpatricks are expecting a baby boy any day now. Family mem- 
bers have strong opinions about what the child will be named. Suggested names are 
William, Warren, Benjamin, Fitzhugh, Chancellor, Millhouse, and Nebuchadnezzar. The 
parents want their son to have three given names, such as William Fitzhugh Millhouse 
Fitzpatrick. How many distinct names are there for the Fitzpatrick son? 


Solution: We may let A = {1, 2, 3} be the positions for the names to be chosen, and 


B the set of names. A complete name can be thought of as a function f : A ay 7s 
There are P(7, 3) = 7!/4! = 7-6-5 = 210 such functions. | 


If there are as many slots to fill as objects to place in them, then there are P(n, n) = 
n\/(n — n)! =n! ways to do it. We call such an assignment a permutation of n objects. 


Example 3.4.14. If you have a collection of eight CDs and want to arrange them in a 
row on top of your dresser, there are P(8, 8) = 8! = 40,320 ways to do it. 


3.4.3 Combinations and partitions 

SSS 

Now let’s suppose we're going to select m objects out of a supply of n, but instead of 
counting the number of ways that m of the n objects can be arranged, we want to count 
how many ways there are simply to choose m of the n objects, without considering the 
order of the objects as relevant. In other words, given a set of n elements, how many 
m-element subsets are there? When we create an m-element subset of an n-element set, 
we call it a combination of n objects taken m at a time. 


3.4 Counting Methods and the Binomial Theorem 


One way to see how many m-element subsets exist from an n-element set is to finagle 
the answer from P(n, m). Let S be the set of all the P(n, m) arrangements of m elements 
ofann-element set, and partition S into subsets of arrangements that all contain precisely 
the same choices. For example, if A = {a, b, c, d} and we are going to choose m = 3 
of these four elements, then S$ contains P(4, 3) = 4!/1! = 24 permutations, and the 
family of subsets of S that we’ve described consists of: 


S; = {(a, b,c), (a, c, b), (b, a,c), (b,c, a), (c, a, b), (c, b, ad}, 
Sy = {(a, b, d), (a, d, b), (b, a, d), (b, d, a), (d, a, b), (d, b, a}, 
S3 = {(a, c,d), (a, d,c), (c, a, d), (c,d, a), (d, a,c), (d,c, a)}, 
Sa = {(b, c, d), (b, d,c), (c, b, d), (c, d, b), (d, b,c), (d, c, b)}. 


(3.26) 


Thus, the P(n, m) permutations of n objects taken m at a time can be partitioned into 
subsets, each of which consists of the m! arrangements of an m-element subset of A. This 
means that there are P(n, m)/m! m-element subsets of an n-element set. There are two 
common notations for the number of combinations of n objects taken m at a time: 


( ) n! 
C(n, m) = = ——_.. (3.27) 
m m!(n — my)! 

Since C(n, m) is the number of ways of choosing m out of n objects, it is also called “n 
choose m.” Notice that C(n, m) = C(n,n — m), so the number of ways to choose m out 
of n objects to serve some purpose is the same as the number of ways to choose n — m of 
the objects not to serve. Most of the time we'll use C(n, m) only for 0 < m <n, though 
in what follows we will define C(n, m) form < 0Oandm > n. Notice that C(0, 0) = | is 
meaningful, in that there is precisely one zero-element subset of the empty set. 


Example 3.4.15. Your club of 50 members is going to choose a steering committee of 
5 people. Each such committee can be thought of as a 5-element subset of a 50-element 
set. The number of these subsets is 
SOV. 0530), = 50 240 Ae 47 AG 
5) 5145! 5-4-3-2-1 
To prove rigorously that an n-element set has C(n, m) subsets of cardinality m, you 
need the following: 


= 2,118,760. (3.28) 


Theorem 3.4.16. Supposen > 0 and1<k <n. Then 


(," ) ui (;) = ce ') (3.29) 


Proving Theorem 3.4.16 does not require induction, only some straightforward al- 
gebraic manipulation (Exercise 6). You'll need it in the inductive step when you prove 
the result we’re after (Exercise 7). 


Theorem 3.4.17. Suppose A is a finite set of cardinality n, and let0 < m <n. Then the 
number of subsets of A of cardinality m is 


n n! 
CUR) (") = Grae 
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Example 3.4.18. The state lottery is played by drawing six numbers from Ns), and a 
winner is declared if someone matches all six numbers, without any consideration of 
order. The number of ways a lottery drawing can turn out is C(51, 6) = 18,009,460. 


Choosing an m-element subset of an n-element set is the same as partitioning the set 
into two subsets, one of size m and one of size n — m. We can generalize this process in 
the following way. 

Suppose |A| = n and we want to partition A into p > 1 distinguishable sub- 
sets whose cardinalities are nj, nz, ..., Np, where ee ny = n and none of the nx 
are zero. How many ways can we do it? There are two informal ways to see what 
the answer is. The first is to imagine choosing n, of the n objects as the first sub- 
set, then choosing n2 of the remaining n — nj, to be the second subset, then n3 of 
the remaining n — nj — no, etc. It would seem natural that multiplying these com- 
binations would produce the correct answer for this multistep process. If so, then we 


have 
n n—-ny, n—n,—ny Np 
n\ n2 n3 Np 


n! (n—n})! (n —n, — Nn)! np! 


nyin—ny)! ng!(n—ny—ng)! n3!(n—ny —n2—7n3)! n,!0! 


n! 
= —___, (3.30) 
ni !nz!n3!---np! 


Here’s another way to see that the final expression in Eq. (3.30) is correct. Imagine n 
ordered slots, and group together the first n,; slots, group the next nz slots, group the 
next 13, etc. For a given partition of the n objects into (unordered) subsets of size n1, 
N2, ...) Np, imagine the n; elements of the first subset being placed into the first n; 
slots in any order, the nz elements of the second subset placed into the next nz slots 
in any order, etc. Dumping the unordered subsets into the ordered slots in this way 
produces a permutation of the n objects, so if we can calculate the number of ways 
of placing the subset elements into the grouped slots, we can know what fraction of 
the n! permutations of the n objects correspond to this one partition. Now there are 
n,! ways to place the elements of the first subset into the first group of slots, n2! ways 
to place the elements of the second subset into the second group of slots, etc. Since 
all steps of the arranging are independent of each other, we conclude that there are 
n,!nz!---n,! ways to arrange the elements. Thus we have an informal argument for the 
following. 


Theorem 3.4.19. Suppose |A| =n, and let n,,no,...,np €N be such that )°?_) np = 
n. Then the number of ways to partition A into subsets of size nj, nz, ..., Np is n!/ 
(ni!nz!---np!). 


Example 3.4.20. Your club of 50 members is going to choose a president, vice-president, 
secretary/treasurer, and two committees, each with 5 members. No one may serve in more 
than one capacity. This process can be thought of as partitioning a 50-element set into 
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subsets of size 1, 1, 1, 5, 5, and 37. The number of ways this can be done is 


4 4 47\ (42 50! 
2 i y = ———_—— = 153,453,043,779,235,200. (3.31) 
1 1 1 5 5 L!L!1!5!5137! 


3.4.4 Counting examples 

= = 

Let’s put together all the counting techniques we’ve developed to answer some more 
complex questions. 


Example 3.4.21. Suppose you tossa coin 10 times and observe the sequence of outcomes. 
From Example 3.4.10, we know that there are 2!° possible outcomes. We want to count 
the number of these outcomes that have exactly three Heads. Imagine having 10 letters, 
3 Hs and 7 Ts, and we are going to fill 10 slots with these letters. We want to count 
the different arrangements of the Hs and Ts. Each such outcome can be thought of as a 
three-element subset of Nig, where the three numbers chosen are the slots where the Hs 
appear. There are C(10, 3) = (10-9 - 8)/(3- 2-1) = 120 such choices. 


Example 3.4.22. A box of lightbulbs contains 100 bulbs, 3 of which are defective. If 
we choose 10 bulbs from the box, we want to calculate how many ways there are to get 
precisely one of the defective bulbs. Such a choice of 10 bulbs consists of nine of the 
97 good bulbs chosen without regard to order, and one of the defective bulbs chosen 
from the three. Multiplying these, we have C (97, 9) x C(3, 1) ways of choosing the bulbs 
in which precisely one is defective. 


Theorem 3.3.6 says that if A and B are disjoint, then |A U B| = |A| + |B]. Applied 
to counting the number of ways of performing a task, we call this theorem the sum rule. If 
counting the number of ways of performing a task must be broken up into disjoint cases, 
the sum rule says simply that the calculations for the different calculations are added to 
produce the total number of ways of performing the task. 


Example 3.4.23. A license plate consists of up to 7 characters, taken from the 26 letters 
and the digits 0-9. If there is room for spaces, they are not counted in the arrangement; 


for example, there is no distinction between] TRU BLU |and| TRUBL U |. We count 


the total number of plates California can issue. There are different cases based on the 
number of characters used. The number of one-character license plates is 36; the number 
of two-character license plates is 36”, etc. Thus the total number of license plates is 
36 + 36? + 36° +--- + 367. 


Sometimes it’s easier to calculate |A| by calculating |U| and |A’|, then using the fact 
that |A| + |A’| = |U|. 


Example 3.4.24. Your club of 30 women and 20 men is going to choose a committee 
of 5. We calculate the number of ways to choose the committee if it must include at least 
1 man. The number of ways to choose the committee with no restrictions is C (50, 5), 
and the number of ways to choose the committee with no men is C(30, 5). Thus the 
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number of ways to choose the committee where there is at least 1 man is C(50, 5) — 
C(30, 5). 


3.4.5 The binomial theorem 

EE 

Combinations pop up in several places in mathematics, and we want to take some time 
now to see an important one. The appearance of C(n, m) ina variety of situations reveals 
some nice ties between seemingly unrelated mathematical ideas. First, there’s a nice way 
to visualize Theorem 3.4.16 in what is called Pascal’s triangle, though to create it we need 
to extend the definition of C(n, m). For a fixed n > 0, we want to allow m < 0 and 
m > n, and in either case, we define C(n, m) = 0. You can think of this as saying that 
there are zero ways to choose m elements from ann-element set ifm < Oorm > n. With 
these definitions, Theorem 3.4.16 is true for all m > O and all k € Z (Exercise 8). 

We construct Pascal’s triangle row by row, beginning with a row that corresponds to 
n = 0, which we'll therefore call row zero. The next row corresponds to n = 1, etc., and 
for each row, the entries correspond tok = 1,2, ..., 7. (See Fig. 3.6.) Row zero consists 
of the single entry 1, which is C(0, 0). To create what we are calling the first row, imagine 
zeroes to the left and right of C (0, 0) in row zero, which we may think of as C(O, —1) and 
C(0, 1), respectively. Create the two entries in row one by adding adjacent entries in row 
zero. By Theorem 3.4.16, the first entry in this first row is C(O, —1) + C(O, 0) = C(1, 0), 
and the second is C(0,0) + C(O, 1) = C(1, 1). Thus the entries in row one are all 
possible C(1,k) for 0 < k < 1. Continuing in this fashion to create the second row, 
imagine C(1, —1) = 0 on the left end of the first row and C(1, 2) = 0 on the right end. 
Adding adjacent entries of the first row, we generate C(2, 0), C(2, 1), and C(2, 2) by 
Theorem 3.4.16. Finishing out as many rows of Pascal’s triangle as you like, we have that 
row n consists of C(n, k) forO < k <n. 

Here’s an important place where the entries in Pascal’s triangle appear. Suppose 
youre going to expand the expression (a + b)" for some n € N. Rather than multiply 
out (a+ b)(a+b)--- (a+b) using the extended distributive property, there is a much 
easier way to see what the terms are, and it involves C (n, k) in the coefficients. 


n=O 0 1 0) 
Ne Se 
n=1 0 1 1 0 
Ve Se NS 
n=2 1 2 1 
ae ak, a a 
n=3 1 3 3 1 
LN Fe ee HS 
n=4 1 4 6 4 1 
7% % SON 7 Se NS 
n=5 1 5 10 10 5 1 
SN fe, SOM: ha OO 
n=6 1 6 15 20 15 6 1 


Figure 3.6 Pascal’s triangle. 


3.4 Counting Methods and the Binomial Theorem 
Theorem 3.4.25 (Binomial theorem). Leta, b € R\{0}, and letn € W. Then 


(a +b)" = n a’b? + n a" pl + n ab? tee + n aob" 
0 1 2 n 


oO f)\ «nk f< n n—k pk 
=> (i) w= ("Ja b*. 


k=0 


(3.32) 


The only reason we do not allow a or b to be zero in Theorem 3.4.25 is that 0° is not 
defined in R, so Eq. (3.32) would produce some undefined terms. But if either a or b is 
zero, the expansion of (a + b)” is not particularly interesting. Thus we omit it. Using the 
entries from Pascal’s triangle, we have 


(a+b) = 1a°b°® = 1, 

(a+b)! = 1a'b® + 1a°b! =a+b, 
(a+b) = a? + 2ab +B’, 

(a+b) = a? + 3a7b + 3ab? +b, etc. 


(3.33) 


There are several ways to convince ourselves that the binomial theorem is true, us- 
ing somewhat open-ended arguments that appeal to some of our previous counting 
techniques. One way is to note that applying the extended distributive property to 
(a+ b)(a+b)---(a +b) is tantamount to creating a whole bunch of terms of a form 
like aaabbaab, where we go to each (a + b) factor and select either a or b. Since each 
factor allows for two possible choices, there are 2” terms generated, and every one is of 
the form a"—*b‘ for some 0 < k <n. The question we must answer is, for a given k, how 
many times does a”~*b* appear in all the distribution of the multiplication? If k of the 
factors supply us with b and the remaining factors provide us with a, then the number of 
times a”—*b* appears in the grand sum is the same as the number of ways of choosing k 
of the n terms to provide us with b (or n — k of the terms to provide us with a). That is, 
the term a”~*b* is produced C(n, k) times in the distribution. 

We're going to provide a more rigorous proof of Theorem 3.4.25 than our informal 
argument here. The technique we use in making the inductive step contains some manipu- 
lation of summations that can really come in handy in some of your other coursework, es- 
pecially, for example, differential equations and complex analysis, where you must resort 
to finding what we call a series solution to a problem. Watch how we pull off some first and 
last terms from the summations, then realign the terms by changing the counter variable. 


Proof of theorem 3.4.25: Leta, b € R\ {0}. 
(J1) Forn = 0, (a+ b)® = 1 = C(O, 0)a°b®, so Eq. (3.32) is true forn = 0. 
(J2) Suppose n > 0 and that Eq. (3.32) is true for n. Then 


(a+b)"*! = (a+b)"(a+b) = bs C(n, ad (a+b) 
i=0 
= > Capa BH + SS C(n, i)a" bit! (3.34) 
i=0 i=0 
= 2 C(n, iat )—ipi + > C(n, Dat D-EGDpal 
i=0 i=0 
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To align the terms in the two summations, let j = i + 1 for the second summa- 
tion (that is,i = j — 1), and note thatO <i <nisthesameasl < j <n+1. 
Making this substitution first, then pulling off a few individual terms, we can 
continue Eq. (3.34) several steps. 


i atl 
(a+b)"t! = » C(n, iat )-‘ bi + DS C(n, j — Da®*)—ipi 
= = 
(3.35) 


= )0Ca, iat? 'bi + YO C(n, § — Ya? Fb! 
i=l j=l 
+C(n, O)a"*!b® + C(n, n)a°b"*!” 


Now C(n, 0) = C(n + 1,0) and C(n,n) = C(n+ 1,n + 1). Also, we may 
let k = i = j and combine the two summations and apply Theorem 3.4.16 to 
have 


(a+b)! = Sica, ky +C(n, k — IJa@t-k pk 
k=l 
+Cin+1, O)a"t!-°p? +C(in+1,nt+ Da°b"*! 


= > CaHt 1, kKathkpk (3.36) 


k=1 
+C(n +1, 0)a"t!-°p° + Cant 124+ Das"! 


n+l 
=) >cat i kat? *pk, 
k=0 


which verifies that Eq. (3.32) holds for n + 1. | 


Corollary 3.4.26. Foralln < W, 


"ifn 
ae (3.37) 
> () 


Proof: Leta = b = 1 in Eq. (3.32). a 


Suppose A is a finite set with cardinality n. Corollary 3.4.26 implies that A has 2” 
subsets, for the left-hand side of Eq. (3.37) is the sum of the number of subsets of an 
n-element set with all possible numbers of elements. Another way to see the same result 
is to create a one-to-one correspondence between the family of all subsets of A and the 
collection of the 2” functions from A to {0, 1}. For Ay C A, we may define 


_ fl, ifae Ay; 
f@= ic eae Ay. (3.38) 


Every distinct subset of A generates a distinct function, and by Theorem 3.4.9, there 
are 2” functions from A to {0, 1}. 


3.4 Counting Methods and the Binomial Theorem 


EXERCISES 


10. 


ll. 


12. 


13. 


14. 


. Finish the verification that the equivalence defined on A x B on page 118 is an 


equivalence relation by showing it has properties E2—E3. 


. Prove Theorem 3.4.2: If Aj, Az, and B are sets, then 


(A, U Ad) x B= (A; x B)U (A> x B). 


Furthermore, if Ay 1 Az = %, then (A; x B)M(A2 x B) = @. 


. Prove Theorem 3.4.4: Suppose A and B are nonempty, finite sets. Then |A x B| = 


[Al x [BI."° 


Prove Theorem 3.4.6: Suppose {Ag};_, is a finite family of finite sets. Then 
[Tier Ac] = [Tia lel. 


. Prove Theorem 3.4.12: Let A and B be nonempty sets with |A| = m <n = |B|. 


Then there are P(n, m) distinct one-to-one functions from A to B. 


Prove Theorem 3.4.16: Suppose n > O and 1 < k <n. Then ae) + (7) = eae 


Prove Theorem 3.4.17: Suppose A is a finite set of cardinality n, and letO0 < m <n. 
Then the number of subsets of A of cardinality m is 


n n! 
Cnn) = (") = mi(n —m)\ 


Verify that Eq. (3.29) holds ifk < Oork >n+1. 


The State Lottery Commission (Example 3.4.18) is considering including the num- 
ber 52 in its drawing, so that the game is played by drawing six numbers from N59. 
By what percentage is the number of possible outcomes increased? 


In California, a standard license plate consists of a digit 1-9, followed by three letters, 
followed by three more digits 0-9, such as 3AAG045. If all constructions of this form 
are considered usable, how many standard license plates can California issue? 


Twenty horses run a race in which prizes are given for win, place, and show (first, 
second, and third places). How many outcomes are there for the race? 


Pizza Peddler offers 12 different toppings, and is having a special on their two-topping 
pizzas. How many distinct ways are there to order one of their special pizzas? 


You overheard that a TRUE/FALSE test of 10 questions had 7 TRUE and 3 FALSE 
answers. If you take the test with this knowledge, how many ways are there to fill out 
the answer sheet? 


Your club consists of 50 members, 22 men and 28 women. A committee of 5 is to be 
chosen, which must contain 2 men and 3 women. How many distinct committees 
can be chosen? 


Induct on |A| > 1, thinking of | B| as fixed. 
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15. 


16. 


17. 


18. 


19. 


Your office is 10 blocks east and 7 blocks north of your apartment, in a section of 
the city where the streets run uninterruptedly north-south and east-west. Every day 
you go to work by walking along these streets the 17 blocks, always going either east 
or north. How many distinct paths to work are there?!° 


When you walk home from the office (as in the previous problem), you always like 
to stop at the ice cream parlor located on the corner two blocks south and five blocks 
west of the office. How many distinct paths back home are there if you always stop 
for ice cream? 


Twelve jurors and 2 alternates are chosen from 30 people summoned for jury duty. 
How many ways can this be done? 


Let a word be defined as any arrangement of letters from the alphabet, so that, for 
example, EIEIO is a word. If the vowels are {A, E, I, O, U}, how many four-letter 
words contain at least one vowel? 


Your club of 30 women and 20 men is going to choose a committee of 5, which must 
not consist of 5 people of the same gender. How many ways can it be done? 


'6See Example 3.4.21. 
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The Real Numbers 


Let’s look at some defining characteristics of the area of mathematics we call analysis. 
Given a set S, its elements might be endowed with a measure of size. The size, or norm, 
of an element x is typically denoted |x|, like absolute value on R, or perhaps ||x||. The 
norm of an element will always be a nonnegative real number, so that a norm is really a 
function |-| : S > R* U {0}. There are two other characteristics that a norm must have. 
You've already run into all three in the context of the real numbers in Section 2.3.3. They 
are properties N1—N3 beginning on page 57. 

Measuring the sizes of elements is only one type of structure that can be placed ona set 
that puts it squarely in the field of analysis, but some notion of measure with nonnegative 
real numbers is characteristic of structures in analysis. For example, some structures do 
not have a norm, but they do have a way of measuring some idea of distance between 
elements. Such a measure of distance is called a metric. Whether S is endowed with a 
norm or a metric, the measure it represents is inextricably tied to R. Naturally, then, R is 
at the heart of all analysis, and one could argue that no analysis into any structure except 
R should be undertaken until one understands all the axioms and fundamental results 
of the theory of real numbers. 

We said at the outset that this book is not designed to be a rigorous study of the 
foundations of mathematics. However, most of the assumptions about R in Chapter 0 
are standard axiomatic ones. It was primarily with assumptions A21 and A22 that we 
glossed over a lot of groundwork that could have been laid. In all fairness, however, it 
should be said that all the assumed properties of R in Section 0.2 should be addressed 
by beginning from scratch with the set {0, 1} and progressing through W, Z, Q, to R, 
and seeing how the assumptions are motivated along the way to developing this smooth 
continuum we envision as the set of real numbers. 

Be that as it may, a norm or metric on S, if one exists, lends itself to much fruitful 
study: Sequences and their convergence, continuity, and calculus are but a few. In this 
chapter we address some fundamental properties of R that arise out of the assumptions 
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from Chapter 0. These properties are important not only because they apply to R, but 
also because they are typical of properties of many other structures in analysis. In your 
later coursework in analysis, you'll study concepts that sound peculiarly similar to those 
here, even though the set you're dealing with might be, say, a set of functions. 


—— 4.1 The Least Upper Bound Axiom 


The least upper bound (LUB) axiom is a standard axiom of the real numbers, endowing 
it with some of its familiar features. For example, the way we visualize R as a numberline 
with no holes is due to the LUB axiom. There are two other characteristics of R that are 
logically equivalent to the LUB axiom. One is called the nested interval property (NIP) and 
has to do with what you get when you intersect a whole bunch of intervals in R that have 
certain properties. The other is called completeness and has to do with the way certain 
sequences behave when the terms get close to each other. It is possible to assume any 
one of the LUB axiom, the NIP, or completeness and derive the other two as theorems. 
Assuming the LUB property as an axiom is probably most common, so we choose that 
route. In this chapter, we'll prove the NIP from the LUB axiom and completeness from 
the NIP, and look into the converses of these theorems to show that all three are logically 
equivalent. In this section we explore the LUB property of R in some depth to get a feel for 
what it means and derive some immediately important implications. First, a definition. 


Definition 4.1.1. Aset A C Ris said to be bounded from above if there exists M; € R 
such thata < M, foralla € A. Ais said to be bounded from below if there exists Mz € 
R such that a > M) for alla € A. A is said to be bounded if there exists M > 0 such that 
ja| < M foralla € A. 


The following theorem seems like the most obvious thing in the world, but it must 
be demonstrated in terms of Definition 4.1.1. 


Theorem 4.1.2. A set A C R is bounded if and only if it is bounded from above and below. 


Proving the = direction of Theorem 4.1.2 is easier. For if A is bounded, then the 
guaranteed M > 0 such that |a| < M for alla € A should clearly suggest values of M, 
and M2 suchthata < M, anda > M> foralla € A. However, in proving the < direction, 
you must use the existence of M, and M) such that M; < a < M> foralla € A to create 
a single value of M > 0 such that —M <a < M foralla € A (Exercise 1). 


4.1.1 Least upper bounds 

Ss 

Suppose A C Ris nonempty and bounded from above. Among all possible upper bounds 
for A, we would call L a least upper bound for A if L is an upper bound for A with the 
additional property that L < N for every upper bound N. That is, among all upper 
bounds for A, L is no bigger than any of them. The LUB axiom says that every nonempty 
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subset of R that is bounded from above has a least upper bound in R. We state it again 
here for the sake of reference. 


(A20) Least upper bound property of IR: If A is a nonempty subset of R that is 
bounded from above, then there exists L ¢ R with the following properties: 


(L1) For every a € A, we have that a < L, and 
(L2) If N is any upper bound for A, it must be that N > L. | 
A natural first question to ask is whether there can be more than one LUB for a 
nonempty set A. Naturally, the answer is no, which you'll prove in Exercise 2. 
Theorem 4.1.3. The least upper bound of A C R is unique. 


Interval notation is a convenient shorthand for a subset of R consisting of a single, 
clean chunk of the numberline. Here are some illustrations of the notation: 


(a,b) = {x Ee R:a<x <b} (4.1) 
[a,b] ={x Ee R:a<x<bd} (4.2) 
(a, +oo) = {x ER: x >a} (4.3) 
(—oo,a] = {x ER: x <a}. (4.4) 


Intervals of the forms (4.1) and (4.3) are called open intervals, and those of the form 
(4.2) and (4.4) are called closed intervals. The motivation for these terms will become 
clear in Section 4.2 where we discuss open and closed sets in general. 

The Greek letter € (epsilon) has been used so much in analysis to represent an arbitrary 
positive real number that it has come to have a personality all its own. Although € is not 
generally thought to be of any particular size, it is usually present in theorems and proofs 
because smaller values of € represent the primary obstacle to overcome in concocting 
the proof. At first, when you read a theorem using the phrase “for all € > 0,” you might 
mentally insert the additional phrase “no matter how small” in there. This might help you 
catch the spirit of the theorem, but just remember it might not be particularly relevant. 
With this in mind, we can now present an alternate form of the LUB property that is 
often easier to work with than A20. 


Theorem 4.1.4. Given A C R, L is the least upper bound for A if and only if the following 
two conditions hold. 


(M1) For everye > 0 (no matter how large), (L, L+6)NA=¥M. 
(M2) For every € > O (no matter how small), there existsa € AN (L —e, L]. 


Theorem 4.1.4 is a natural way to visualize the LUB property in terms of how intervals 
to the left and right of L intersect the set A. Every interval of the form (L, L + €), no 
matter how large € is, must not contain any elements of A. Also, every interval of the 
form (L — €, L], no matter how small € is, must contain some element of A. You'll 
prove Theorem 4.1.4 in Exercise 3. The logic of the proof deserves a comment, so here’s 
a suggestion about how to proceed. Theorem 4.1.4 says (LI A L2) < (M1 A M2). Let’s 
look only at the — direction, for the < direction would be similar. In showing >, 
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Exercises 3e and 3h from Section 1.2 imply that 


(L1 A L2) > (MIA M2) © [(L1A L2) — M1JA [(L1 A L2) — M2] 
 [(L2 A -—=M]1) > -L1) A [Ll A =M2) > -L2]. (4.5) 


Thus, to prove the > direction of Theorem 4.1.4, you will want to show Ll > M1 
by contrapositive in the presence of L2, then show L2 > M2 by contrapositive in the 
presence of L1. Then you can prove < in a similar way. 


Example 4.1.5. Show that the set {1} has LUB 1. 


Solution: Picke > 0. Because (1, 1 + €) {1} =, property M1 holds. Also, because 
1 e€ d —e, 1]/N {1}, property M2 holds. Thus 1 is the LUB of {1}. | 


With Theorem 4.1.4, it’s also easy to see why b is the LUB of intervals of the form 
(a, b) or [a, b] (Exercise 4). 


4.1.2 The Archimedean property of IR 

a 

The LUB property of R has an important implication on the natural numbers, saying 
that they are unbounded in R. 


Theorem 4.1.6. For allx > 0, there existsn € N such thatn > x. 


To prove Theorem 4.1.6, suppose N is bounded in R. Then its LUB can be shown 
to pit properties M1 and M2 against each other to produce a contradiction 
(Exercise 5). 

Theorem 4.1.6 has a logically equivalent form called the Archimedean property of 
R. We have already pointed out briefly that the ancient Greeks were very sophisticated 
mathematically. One idea that they used involved what we would call an infinitesimal 
number. Different from zero, an infinitesimal was considered to be smaller than every 
positive number. One salient property of an infinitesimal is that you can add it to itself 
any finite number of times and still have an infinitesimal. The following theorem says 
in effect that there are no real number infinitesimals. It’s named after Archimedes, who 
argued against the use of infinitesimals. You'll prove it in Exercise 6. 


Theorem 4.1.7 (Archimedean Property). For every €,r > 0, there existsn € N such 
thatne > r. 


Thus no matter how small a positive number € is, and no matter how large r > 0 
is, you can add € to itself some n number of times to produce a sum that exceeds r, so 
that € is therefore not an infinitesimal. The Archimedean property often proves to be 
useful in a slightly different form, using the specific case r = 1 and writing the inequality 
as 1/n < €.In this form, the Archimedean property says no matter how small € > 0 is, 
there exists some n € Z whose reciprocal is underneath e€. 
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4.1.3 Greatest lower bounds 

Ea) 

Now let’s turn the LUB property upside down and discuss bounds from below. We don’t 
have to make any new assumptions concerning the existence of what we call the greatest 
lower bound of a set, for we can derive results from the LUB property by flipping a set 
upside down, so to speak. 


Definition 4.1.8. Anumber G € Ris said to be a greatest lower bound (GLB) fora set A 
if G has the following properties: 


(G1) Foreverya € A, we have thata > G, and 
(G2) If N is any lower bound for A, it must be that N < G. 


You will prove the following theorems in Exercises 9 and 10: 


Theorem 4.1.9. IfA C R is a nonempty set bounded from below, then A has a GLB. 


Theorem 4.1.10. The greatest lower bound of A C R is unique. 


4.1.4 The LUB and GLB properties applied to finite sets 

a 

The intervals (a, b) and [a, b] illustrate that the LUB and GLB ofa set might or might not 
actually be in the set. Ifa set S = {a,}7_, is a finite set of real numbers, then we probably 
expect that the LUB and GLB of S will be elements of S$ that we might call a maximum 
or minimum value of S. In the rest of this section, we prove this fact, which gives us the 
right to talk about the maximum and minimum values of a finite set. You'll prove all the 
results in the exercises. 


Theorem 4.1.11. If S = {ax}{_, is a finite set of real numbers, then S is bounded. 


Theorem 4.1.12. Let S = {a;,}_, bea finite set of positive real numbers. Then there exists 
N €N such that1/N < a, foralll <k <n. 


Theorem 4.1.12 extends the Archimedean property of R to apply to a finite set. 
Be careful, though. The Archimedean property applied to each element of S separately 
guarantees that, for all 1 < k <n, there exists N;, such that 1/N, < a,x. This is different 
from saying that there exists a single N satisfying 1/N < a, for alll < k <n. This 
distinction occurs quite often in mathematics. It’s one thing to say that every question 
has an answer. It’s quite another to say that there is a single answer that works for every 
question. 

The Archimedean property says that numbers of the form 1/n are clustered around 
zero arbitrarily closely. If you zoom in to any interval (—€, €), no matter how small € 
might be, there will be some 1/n in this interval. You cannot zoom in close enough to 
find an interval around zero that is devoid of numbers of the form 1/n. In general, if it’s 
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possible to find some € > 0 such that the interval (L — €, L + €) contains no elements 
ofa set A, we say that A can be bounded away from L. Theorem 4.1.12 says, in effect, that 
a finite set of positive numbers can be bounded away from zero. We state this fact as a 
corollary. 


Corollary 4.1.13. A finite set of positive real numbers can be bounded away from zero. 
That is, if S = {ag}i_, is a finite set of positive real numbers, then there exists M > O such 
thata, > M foralll <k <n. 


We're ready for the main result. If S = {az }7_, isa finite set, then by Theorem 4.1.11, 
S is bounded. Thus by Theorem 4.1.2, it is bounded from above and below. By the LUB 
and GLB properties, S has a LUB and GLB. In fact: 


Theorem 4.1.14. If S = {a,}_, is a finite set of real numbers, then S contains its LUB 
and GLB. 


You'll prove the LUB part of Theorem 4.1.14 in Exercise 14. The GLB part would be very 
similar. Theorem 4.1.14 motivates the following definition. 


Definition 4.1.15. Let S = {a,}7_, bea finite set of real numbers. Then the LUB and GLB 
of S are called the maximum and minimum values of S, respectively, and are denoted max S 
and min S. 


In Section 4.2 the importance of Theorem 4.1.14 will become clear right away. Here 
are two simple illustrations of how it will be important. First, suppose anyone at least 
16 years old can drive, anyone at least 18 can vote, and anyone at least 65 can collect Social 
Security. How can you guarantee that a chosen person can drive, vote, and collect Social 
Security? Certainly you would choose a person whose age is at least max{16, 18, 65}. For 
ifwe write M = max{16, 18, 65}, then M satisfies all three inequalities M > 16, M > 18 
and M > 65. 

As another example, suppose every real number within €; distance of zero is in set A, 
and every real number within €) of zero is in set B. That is, if |x| < €,, then x € A, and 
if |x| < €, then x € B. If we let € = min{e1, €2}, we can be sure that every real number 
within € of zero will be in AM B. For if |x| < €, then both |x| < €; and |x| < €2 are 
true, so thatx € Aandx € B. 

The terms max $ and min S are often used when S is not finite in the event that $ 
contains its LUB and GLB. 


EXERCISES 


1. Prove Theorem 4.1.2: Aset A C Ris bounded if and only if it is bounded from above 
and below.! 


'To prove <, let M = |M;| + |M2|. Use Exercise 14 from Section 2.3. 


10. 
ll. 
12. 


13. 


14. 


. Prove Theorem 4.1.7: For every €, r > 0, there exists n € N such that ne > r. 


4.1 The Least Upper Bound Axiom 


. Prove Theorem 4.1.3: The least upper bound of A C R is unique. 


Prove Theorem 4.1.4: Given A C R, L is the least upper bound for A if and only if 
the following two conditions hold: 


(M1) Foreverye > 0,(L,L+6€)NA=M9. 
(M2) For every € > 0, there existsa Ee AN (L —e, L]. 


Show that b is the LUB of the interval (a, b). (An identical argument will work for 
[a, b].) 


Prove Theorem 4.1.6: For all x > 0, there exists n € N such thatn > x. 
2 


. Prove that between any two real numbers a < J, there exists a nonzero rational 


number. Need some hints??:4:5:© 


Prove that between any two real numbers a < J, there exists an irrational 
number.’ 


. Prove Theorem 4.1.9 in the following way. Suppose A C Risanonempty set bounded 


from below. Let M be any lower bound for A, and define B = {—a : a € A}. 


(a) Show that B is bounded from above by —M. 


(b) Applying the LUB property to B, we conclude the existence of L € R, the LUB 
of B. Use the fact that L satisfies properties L1 and L2 with regard to B to show 
that —L has properties G1 and G2 with regard to A. 


Prove Theorem 4.1.10: The greatest lower bound of A C R is unique. 
State a theorem analogous to Theorem 4.1.4 for the GLB of a set. 


Prove Theorem 4.1.11: If S={a,}/_, is a finite set of real numbers, then S is 
bounded.® 


Prove Theorem 4.1.12: Let S = {a,}{_, bea finite set of positive real numbers. Then 
there exists N € N such that 1/N < aq foralll <k <n? 


Prove part of Theorem 4.1.14: If S = {ax}{_, is a finite set of real numbers, then S$ 
contains its LUB.!° 


Apply Theorem 4.1.6 tox =r/e. 

3First start by assuming a > 0. Worry about a < 0 later. 

4 Apply Theorem 4.1.7 to b — a. 

SIf 1/n < b —a, you can apply Theorem 4.1.7 to a and 1/n. 

Use the WOP for the right m such that m/n > a. 

7 Apply the result of Exercise 7 to a//2 < b/V/2. Why is the resulting c irrational? 

SLet M = Xi_, lay. 

°Get an N;, for each a; that satisfies 1/N; < ax, then let N = wp Ne 

10Tf L is the LUB of S and L ¢ S, then you ought to be able to use the set T = {L — ay }p_, and 
Corollary 4.1.13 to contradict the assumption that L is the LUB of S. 
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— 4.2 Sets in IR 


4.2.1 Open and closed sets 


Definition 4.2.1. Givena € Rande > O, we define the €-neighborhood of a as 


N. (a) = (a-—€,ate)={x €R:a-—e<x<ates={x ER: |x-al <6}, 


(4.6) 
and we say that the neighborhood has radius €. 
Notice if 0 < €; < €, then N., (a) C Ne, (a). For ifx € N-, (a), then 
a-@ <a-€) <x <a+e;<a+6, (4.7) 


so that x € Ne, (a). 


Definition 4.2.2. Aset A C Ris called open if for everya € A, there exists € > Osuch 
that N.(a) C A. 


A set A is open if every element of A is in some interval contained entirely within A; 
that is, every point in A is, in a sense, insulated from the outside of A by a neighborhood 
that is a subset of A. In Exercise 1, you'll show that the intervals (a, +00) and (—oo, b) 
are open. 

As usual, one of the first questions you ought to ask yourself after a definition is 
presented is what its negation is. So what does it mean to say that A is not open? Logically, 


—=(A is open) & (Va € A)(de > 0)(N(A) © A) 
<> (da € A)(Ve > O)(N E A) (4.8) 
<> (da € A)(Ve > 0)(Ax € N.(a))(x ¢ A). 


Thus A is not open if there is some point in A, every €-neighborhood of which contains 
some point not in A. That is, there is at least one point a € A that cannot, as it were, be 
insulated from the outside. 


Example 4.2.3. Show that the set (a, b] is not open. 


Solution: Consider the point b, and pick any € > 0. Then the point b + €/2 is in 
N.(b) but not in (a, D]. | 


Two other sets besides (a, +00) and (—oo, b) immediately present themselves as 
open sets. The empty set is open simply because otherwise there would have to exist 
some x € @ with some property, which is a contradiction. R is open because for any 
point a € R, we may let € = | and have N-(a) CR. 


4.2 Setsin R 


Let’s address unions and intersections of open sets. If A and B are both open, do you 
suspect A U B is open? AM B?!! In both cases the answer is yes, but when an arbitrary 
family of open sets is combined by union or intersection, the answer can change. In 
Exercise 4 you'll prove the following: 


Theorem 4.2.4. The union of a family of open sets is open. The intersection of a finite 
family of open sets is open. 


With Theorem 4.2.4, it follows that if a < b, then the interval (a, b) is open, for 
(a, b) = (—00, b) N (a, +00). The intersection of an infinite family of open sets certainly 
could be open, but it might not be (Exercise 5). 

If we were discussing doors or stores or minds, to say that one is closed would mean 
that it is not open. When it comes to sets, closed does not mean not open. 


Definition 4.2.5. Aset A C Ris said to be closed if A’ is open. 


The interval [a, b] is closed because [a, b]' = (—oo, a) U (b, +00), the union of two 
open sets. Similarly, a set with one element, {a}, called a singleton, is closed because {a}! = 
(—o0, a) U (a, +00). The proof of the following theorem should be quick (Exercise 6): 


Theorem 4.2.6. The intersection of a family of closed sets is closed. The union across a finite 
family of closed sets is closed. 


Because of Theorem 4.2.6, it follows that any finite set is closed because it is a finite 
union of singleton sets. The integers provide a good example of an infinite set that is 
closed. (What is Z’?) In Exercise 8, you'll demonstrate a union of closed sets that is not 
closed. 

Many sets are neither open nor closed. For example, [a, b) is sometimes called a 
half open or half closed interval. Believe it or not, some sets are both open and closed. 
For example, R and ¥ are both open and closed because they are both open and they 
are complementary. Theorem 4.2.7 uses the LUB axiom to show the important fact that 
these are the only two subsets of R that are both open and closed. The proof is a little 
intricate, so we provide it here. It is important for you to work your way through the 
details because it illustrates an important type of proof in which we show that certain 
examples of something are the only ones that exist. Keep a pencil and paper handy to 
sketch some number lines that will help you visualize the details. 


Theorem 4.2.7. The only subsets of R that are both open and closed are 9 andR. 


Proof: We prove by contradiction. Suppose there exists A C R such that A and A’ 
are both open and nonempty. Pick any a; € A and az € A’. Without any loss of 
generality, we may assume aj <dz. Let E = {€ >0: N-(a1) © A}; that is, E is 
the set of all € > O such that the ¢-neighborhood of a, is contained entirely within 
A. Since A is open, E is nonempty. Because a) € A’, E is bounded from above by 
ay — a,. Thus we may apply the LUB property to E to conclude the existence of some 


' Think intuitively in terms of open intervals. 
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€o, the LUB of E. Thus N,, (a1) is the largest neighborhood of a, that is contained 
entirely within A. 

In order to arrive at a contradiction, we look at the endpoints of the interval 
(a1 — €0, a1 + € 0). First we ask if it is possible that a, + €9 € A’. If so, then the 
fact that A’ is open means there exists €; > 0 such that N.,(a1 + €0) C A’. Because 
a, € A, we know that €, < €p. But then a, + €9 —€,/2 € A’. This contradicts the fact 
that N.,(a1) © A. Thus a; + €9 € A. By an identical argument applied to a, — €0, 
we have that a; — €9 € A, so that [a, — €9, aj + €o] C A. 

Now let’s look again at the endpoints of [a; — €9, a, + €0]. Since A is open 
and a; — €9, a1 + €& € A, there exist €2,€3 > O such that N.,(a; — €9) C A and 
N.,(a1 + €o) © A. Letting €4 = min{eéo, €3}, we have that N..+<,(a1) C A. This 
contradicts the fact that €p is the LUB of E, for we have shown that €y + €4 € E. Thus 
it is impossible that A and A’ are both open and nonempty. a 


Theorem 4.2.7 is important both in its own right and because it motivates the idea 
of connectedness of sets in the area of mathematics called topology. By Theorem 4.2.7, if 
A is a nonempty, proper, open subset of R, then A’ is a nonempty, proper subset of R 
that is not open. Thus Theorem 4.2.7 says that R cannot be written as the disjoint union 
of two nonempty open sets. Another way to say this is that if A, B C R are both open, 
and ifR = AUB, then either A = J or B = . This property gives us the imagery of the 
real number line as one connected piece, and as the proof reveals, follows from the LUB 
axiom. In the more abstract setting of topology, a set is said to be connected by definition 
if it cannot be written as the disjoint union of two nonempty, open sets. 


4.2.2 Interior, exterior, and boundary 


Suppose A C R and we're given some x € R. Then precisely one of the following must 
be true. 


1. There exists € > 0 such that N-(x) C A, in which case we say that x is in the interior 
of A, written x € Int(A). 


2. There exists € > 0 such that N.(x) C A’, in which case we say that x is in the exterior 
of A, written x € Ext(A). 


3. For alle > O, there exists aj, a2 € N_-(x) such that ay € A and a) € A’, in which 
case we say that x is on the boundary of A, written x € Bdy(A). 


Note that if x € Int(A), then x € A. Also, if x € Ext(A), then x ¢ A. However, if 
x € Bdy(A), then x might or might not be in A. The gist of x € Bdy(A) is that every 
e-neighborhood of x contains at least one point from each of A and A’, and x is allowed 
to serve as one of these two points, if necessary. Notice that Bdy(A) = Bdy(A’), and that 
Int(A) = Ext(A’) and vice versa. 


Example 4.2.8. Consider A = (2, 3JU{1/n}*,. Then 2.5 € Int(A) bylettinge = 0.25, 
1.5 € Ext(A) by letting « = 0.5, and 2, 3, 0, 1/n € Bdy(A). 


Theorem 4.2.9. A C R is open if and only if Int(A) = A. 
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We noted in the preceding that Int(A) C A, an immediate consequence of the 
definition of Int(A). The proof of D> should be quick (Exercise 9). With Theorem 4.2.9, 
we see that openness is logically equivalent to every point being an interior point. 


EXERCISES 


1. Show that the intervals (a, +00) and (—oo, b) are open.!” 


2. Let F = {(0, 4), (10, 14), (2, 3), (9, 12), (2, 8)}. Write UsegA in terms of as few 
intervals as possible. 


3. Let ¥ = {C1, 10), (—2, 8), (0, 20), (2, 15)}. Write N4-sA in terms of as few intervals 
as possible. 


4. Prove Theorem 4.2.4: Let ¥ = {A} be a family of open sets, and let {By }{_, be a finite 
family of open sets. Then 


(a) LU, A is open. 
(b) (\p_1 By is open. 

5. Show that the intersection of infinitely many open sets might not be open by showing 
Mail /n, 1/n) = {0}. 


6. Prove Theorem 4.2.6: Let J = {A} bea family of closed sets, and let {By }/_, bea finite 
family of closed sets. Then 


(a) ()¢ A is closed.'8 
(b) Uge, By is closed. 
7. Show that ifa < b, then [a, b] is closed. 


8. Give an example of an infinite family F of closed sets such that UsA is not closed. 
Prove. 


9. Prove Theorem 4.2.9: A C R is open if and only if Int(A) = A. 


4.3 Limit Points and Closure of Sets 


Given A C Rand any x € R, we might want to know if elements of A are densely 
clustered around x. 


Definition 4.3.1. A point x€R is said to be a limit point of ACR if every 
€-neighborhood of x contains a point in A other than x itself. That is, for all € > 0, there 
existsa € AM N(x), wherea # x. 


See Exercise 12 from Section 2.3 if you need to. 
'3See Theorem 2.2.7 from Section 2.2 if you need to. 
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Nothing about Definition 4.3.1 requires a limit point of A to be in A. For example, 
the Archimedean property implies immediately that 0 is a limit point of {1/n}°° ,. If, 
however, x € A, then for x to be a limit point of A, the definition requires that ev- 
ery €-neighborhood contain an element of A different from x. This motivates a new 
term. 


Definition 4.3.2. For x¢R and €>0, the set Ne(x)\{x} is called the deleted 
€-neighborhood of x, and is denoted DN,(x). Another way to write DN, (x) is{y € R: 
0 <|y—x| < €}. 


With this new notation, we may say that x is a limit point of A if every deleted 
€-neighborhood of x contains an element of A; that is, for alle > 0, AN DN. (x) 4 G. 
Limit points have other common names, for example, cluster points or accumulation 
points. 

True to form, we ask what it means for x not to be a limit point of A. 


(x isa limit point of A) & 7(Ve > 0)(AN DNe(x) # B) 


(4.9) 
(de > 0)(AN DN, (x) = 9). 


That is, x is not a limit point of A if there is some €-neighborhood of x that is devoid 
of elements of A, other than, perhaps, x itself. We'll prove Theorem 4.3.3 here and leave 
the proof of Theorem 4.3.4 to you in Exercise 3. 


Theorem 4.3.3. If A © R is open, then everyx € A is a limit point of A. 


Proof: Suppose A is open, pick any x € A and any € >0. Since A is open, there 
exists €; > 0 such that N.,(x) C A. Let €2 = minf{e, €;}. Then the point x + €2/2 € 
AN DN, (x). | 


Notice that the converse of Theorem 4.3.3 is not true. For example, every point in 
[a, b] is a limit point, but [a, b] is not open. 


Theorem 4.3.4. A set is closed if and only if it contains all its limit points. 


4.3.1 Closure of sets 

Le 

If A C Ris not closed, we might want to close it off, so to speak, by finding the smallest 
closed superset of A, if there is one. First we define a term for this smallest closed superset 
of A and then we address whether it exists, and if so, whether it’s unique. 


Definition 4.3.5. Suppose A C Ris given, andsuppose C C Ris aset with the following 
properties: 


(C1) ACC. 
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(C2) C is closed. 
(C3) If D C Risclosedand A C D,thenC C D. 


Then C is called a closure of A, and is denoted A. 


Notice how Definition 4.3.5 lays down the characteristics that a set C must have in 
order for it to be a smallest closed superset of A. Property C1 guarantees that any set we 
would call A does, in fact, contain all elements of A. Property C2 guarantees that it is 
actually closed. Property C3 says that any other closed set containing all elements of A 
will not be any smaller. That A exists uniquely is guaranteed by the following theorem, 
which gives us one way to visualize its construction. 


Theorem 4.3.6. Suppose A C R. Then A exists uniquely, and can be constructed as 


A= () S. (4.10) 


SDA 
s closed 


Thus if F is the family of all closed supersets of A, then the intersection across F 
satisfies properties C1-C3, and therefore has earned the right to be called A. Animportant 
question to ask about the construction in Eq. (4.10) is whether there even exist any sets 
in F. If there are no closed supersets of A, then Eq. (4.10) is a vacuous construction. But 
since R D A and R is closed, then this is not the case. You'll prove that the construction 
in Eq. (4.10) satisfies properties C1—C3 and that A is unique in Exercise 5. 

The construction of A in Theorem 4.3.6 is a way of closing off a set A by intersecting 
down from above as it were, through closed sets containing A as if we were shrink 
wrapping it. Another way to arrive at the same thing is to build the closure up from 
below, as the following theorem asserts: 


Theorem 4.3.7. A = Int(A) U Bdy(A). 


The proof of Theorem 4.3.7 is left to you in Exercise 7. It says, in effect, that the 
closure of a set A can be thought of as merely A, some points of which might be interior 
points and some boundary points, with any missing boundary points thrown in. The 
easiest way to prove Theorem 4.3.7 is a bit surprising. There will be a hint in the exercises 
if you need it. Here are our last two theorems, whose proofs are left to you in the exercises. 


Theorem 4.3.8. IfA © B, then A C B. 
Theorem 4.3.9. A is closed if and only if A = A. 


EXERCISES 


1. Prove that ifx is a limit point of A, then every €-neighborhood of x contains infinitely 
many elements of A. 


2. Suppose L is the LUB of aset A and L ¢ A. Show L isa limit point of A. 


3. Prove Theorem 4.3.4: A set is closed if and only if it contains all its limit points. 
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4. Show that {1/n : n € N} is not closed. 


5. Prove Theorem 4.3.6 by showing that the construction in Eq. (4.10) satisfies prop- 
erties C1—-C3, and that A is unique.'* 


6. What does it mean to say x ¢ A? 
7. Prove Theorem 4.3.7: A = Int(A) U Bdy(A). 
8. Determine the closure of the following sets. 
(a) @ 
(b) R 
(c) (a,b) 
(d) (a,b) U (b,c) 
9. Prove Theorem 4.3.8: If A C B, then A C B.!® 
10. Prove Theorem 4.3.9: A is closed if and only if A = A. 


11. One of the following statements is true, and for the other, one direction of subset 
inclusion is true. Prove the three true subset inclusion statements, and provide a 
counterexample to illustrate the falsity of the fourth. 


(a) AUB=AUB" 
(b) ANB=ANB 


=—— 4.4 Compactness 


There is a special kind of subset of R that deserves our attention now. Sets that are 
compact have a fundamental feature that makes them very important if they serve as the 
domain ofa function whose domain and codomain are subsets of R. Before we can define 
compactness, we need another term first. 


Definition 4.4.1. Suppose A C Rand Cis a family of subsets of R. If 


Acs, (4.11) 


Se 


we say that © covers A, or is a cover of A. Expression (4.11) simply means that every x € A 
isinsome S € C. Ifevery S € Cis open, we call € an open cover. If F C C also covers A, we 
call Fa subcover of C. 


'4See Exercises 3 and 6 from Section 2.2, and Theorem 4.2.6. 
Show (A) = Ext(A). 

'6 Apply Exercise 4c from Section 2.1 and show (BY C (A)’. 
Take a hint from the hint for Exercise 7. 
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Example 4.4.2. Let C = {Ni (x) : x € Q}, the set of neighborhoods of radius 1 of all 
rational numbers. Then € is an open cover of R because every real number is within 1 
of some rational. Furthermore, {Ni(n) : n € Z} is a subcover of © because every real 
number is within 1 of some integer. 


Covers and subcovers are of most interest to us when the cover € is an open cover 
with infinitely many sets in it, and F is a subcover that has only finitely many sets from 
€ in it. 

Example 4.4.3. Let a € R. Then the set 
C = {(—oo,a — 1/n) U (a+ 1/n, +00) :n € N} (4.12) 


covers R\{a}, and € has no finite subcover (Exercise 3). 


Example 4.4.4. The set {(—n,n):n € N} is a cover of R with no finite subcover 
(Exercise 4). 


Now we're ready for the definition of compactness. 


Definition 4.4.5. Aset A C Ris said to be compact if every open cover of A has a finite 
subcover. 


To demonstrate that a given A C R is compact from Definition 4.4.5 would ap- 
pear to be quite a task, for something rather powerful must be shown about every 
conceivable open cover of A. Showing A is not compact probably would not be par- 
ticularly difficult, for all we need to do is demonstrate some open cover of A that 
cannot be reduced to a finite subcover. Therefore, it might be fairly easy to eliminate 
entire classes of sets from the possibility of being compact, but showing certain entire 
classes of sets are compact is another story. Thankfully, someone has been down this 
road before and supplied us with the answer to the question of precisely which sets are 
compact. 


Theorem 4.4.6 (Heine-Borel). A set A © R is compact if and only if A is closed and 
bounded. 


We'll piece together the proof of Theorem 4.4.6 in several stages. First, you'll prove 
the => direction in Exercise 5. To prove <, we use the following two theorems. We'll 
prove the first one here, then you'll prove the second one in Exercise 6. 


Theorem 4.4.7. The interval [a, b] is compact. 


Proof: Suppose C is an open cover of [a, b]. Construct E C (a, b] in the following 
way. For x € (a, b], let x be in E if and only if some finite subfamily of € covers the 
interval [a, x]. First we show E + @. Since € covers [a, b], then there is some S; € C 
such that a € S$}. Since S; is open, there exists €; > 0 such that N,,(a) C S;. Thus 
the subfamily {5} covers [a, a + €;/2], so thata + €/2 € Eand E £ &. Also, E is 
bounded above by b. Therefore, E has LUB L < b. First we show that L € E, then 


147 


148 


Chapter 4 The Real Numbers 


we show that L = b in order to have that the entire interval [a, b] can be covered by 
a finite subfamily of C. 

Suppose L ¢ E. Since € is an open cover of [a, b] and L < b, there exists Sp € € 
such that L € Sp. Since S2 is open, there exists €2 > O such that N.,(L) © Sp. 
Since L is the LUB of E, L — €2/2 € E, so that [a, L — €2/2] can be covered by 
a finite subfamily F, C C. But then F U {5} is a finite subfamily of C that covers 
[a, L + €2/2]. This is a contradiction, so L € E. 

Now suppose L < b. Since some finite ¥3 C C covers [a, L], there exists some 
S3 € F3 such that L € $3. Since S3 is open, there exists €3 > O such that N.,(L) € S3. 
Let x = min{L + «3/2, b}. Then x > L, x € S3, and x € [a, b]. Thus, the interval 
[a, x] can be covered by a finite subcover of ©. This contradicts the fact that L is the 
LUB of E. Thus L < b is impossible, and L = b. | 


Knowing that [a, b] is compact moves us very close to showing that any closed and 
bounded set is compact. The following theorem, which you'll prove in Exercise 6, will 
get us within € of being finished. 


Theorem 4.4.8. IfA C B, where A is closed and B is compact, then A is also compact. 
Now we can put the pieces together and prove the Heine-Borel theorem. 


Proof of Theorem 4.4.6. The = direction is Exercise 5. To prove <, suppose A 
is closed and bounded. Then there exists M > 0 such that A C [—M, M]. Since 
[—M, M] is compact and A C [—M, M], then by Theorem 4.4.8 A is compact. 

| 


You might be a little disappointed that we went to all the trouble of defining a new 
term (compactness) in complicated terms of open covers and subcovers, when it turns out 
that compact sets are precisely those that are closed and bounded, two ideas we already 
have a handle on. Well, you would be right to say that there is no reason to define a new 
term if the class of things to which the term applies can be easily described in already 
familiar language. However, the point remains that closed and bounded sets have a very 
important feature: Every open cover can be reduced to a finite subcover. We need this 
feature in Section 5.7. 


EXERCISES 


1. What does it mean for @ not to be a cover of A? 
2. What does it mean for a cover @ ofa set A not to have a finite subcover? 


3. Verify the claim in Example 4.4.3: C = {(—00, a — 1/n) U (a+ 1/n, +00): n € N} 
covers R \ {a}, and C has no finite subcover. 


4. Verify the claim in Example 4.4.4: The set {(—n, 1) : n € N} is a cover of R with no 
finite subcover. 


4.5 Sequences in R 


5. Prove the => direction of Theorem 4.4.6 in the following way: 
(a) Suppose A is not closed. Demonstrate an open cover of A that has no finite 
subcover.!® 


(b) Suppose A is not bounded. Demonstrate an open cover of A that has no finite 
subcover.!° 


6. Prove Theorem 4.4.8: If B is compact and A C B is closed, then A is also compact.”° 


4.5 Sequences in IR 


There are several ways to approach the subject of sequences. Let’s move from an informal 
to a more formal way. A sequence in R is an ordered list of numbers 


(Ay, A2, 43,...,Ayn,...). (4.13) 


Listing the elements of a sequence might remind you of showing that a set A is countable, 
for listing the elements of A exhaustively in the fashion of sequence (4.13) is equivalent 
to constructing a function f : N = A, where f (1) = a1, f (2) = dp, etc. Perhaps we 
can think of a sequence as a function f : N — R. To say that a sequence is a function 
might conjure up slightly different imagery from saying it is a list of numbers, but a little 
thought will convince you that there is no difference. Let’s create some notation. The 
expression (d,)°°_, is one standard way of denoting a sequence. Beginning with term a; 
is a matter of convenience. There might be times it seems more natural to begin with ao. 
If there is a formula for a, say, for example, a, = 1/n, we have 


i 1 1 
(ry = (1, 5? 3° oe } (4.14) 


We must distinguish Eq. (4.14) from {1/n}°°,, which is merely the set of elements of 
the sequence in Eq. (4.14), and does not have the ordering of the elements as one of its 
defining characteristics. Saying a, = 1/n makes the idea of a sequence as a function more 
natural, for we are talking about nothing other than f : N > R defined by f(n) = 1/n, 
and the set {1/n} can then be thought of as the range of the sequence. Then, if we want 
to visualize the sequence graphically, we can, as in Fig. 4.1. Such a graph will help in 
visualizing limits in Section 4.6. Example 4.5.1 presents some examples of sequences 
we'll refer to later. 


Example 4.5.1. 1. The sequence (a,)°°, defined by a, = 1/2 + (—1)" - 1/2 is the 
sequence (0, 1,0, 1,0, 1,...). 


2. Letting a, =n + (—1)” forn > 0 generates (1, 0,3, 2,5, 4, 7,6,...). 


'8Use the characterization of closed sets from Theorem 4.3.4 and Exercise 3. 

'9Use Exercise 4. 

201f @ is an open cover of A, then the fact that A’ is open should suggest a way to expand @ into an open 
cover of B. 
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1/2. 


1/3 4 


1/4 7 
1/5 4 


In 7 


T 
1 


2, 3 4 > n 


Figure4.1 Thesequencea, = f(n) = 1/n. 


3. Ifa, = sinnz, then we generate the sequence (0). 


4. We can define a, in cases. For n > 0, let 


ifn is even, 


ad, = 0, 
" )n?—1- ifn is odd. 


to generate the sequence (0, 0, 0, 8, 0, 24, 0, 48, 0, 80, 0, ...). 


5. Ifa, =n? /2" for n > 1, then we generate the sequence 


1. 9 25 36 49 64 
2’°" 8’ ’ 32’ 64’ 128’ 256°] 


4.5.1 Monotone sequences 


An important class of sequences derives from the following definition: 


(4.15) 


Definition 4.5.2. Asequence (a,)°- , is said to be increasing if dm < Ay wheneverm <n. 
It is said to be decreasing if dy, > a, wheneverm <n. If (a,) is either increasing or decreas- 
ing, then itis said to be monotone. If dy, < dy, wheneverm <n, then (a,) is said to be strictly 


increasing. If Am > Gy, wheneverm <n, then (a,) is said to be strictly decreasing. 


Because of the ordering properties of real numbers discussed in Section 2.3, it follows 
that if (a,) is an increasing (or decreasing) sequence of positive real numbers, then (1/a,) 
is a decreasing (or increasing) sequence of all positive numbers. The result is similar for 
sequences of negative real numbers. Knowing this makes the next example a little easier. 


4.5 Sequences in IR 


Example 4.5.3. Show that the sequence defined by a, = n/(n + 1) forn > 1 is strictly 
increasing. 


Solution: We show that the sequence (1/a,) is strictly decreasing. Suppose m < n. 
Then 1/m > 1/n, so that 1/a, = 1+1/m > 1+ 1/n = 1/a,. Thus the sequence 
(1/d,) is strictly decreasing. a 


Given a fixed x > 0, the sequence (x”) proves to be important. In Exercise 1, youll 
show that (x”) is increasing if x > 1 and decreasing if0 < x < 1. 


4.5.2 Bounded sequences 


In a fashion identical to boundedness of subsets of IR, we can define boundedness of 
sequences. 


Definition 4.5.4. Suppose (a,,) isa sequence of real numbers. If there existssome M; € R 
such that d, < My for allm, then the sequence is said to be bounded from above. Similarly, 
if there exists M> € IRsuch that a, > Mo for all, then (a,) is said to be bounded from 
below. If there exists M > Osuchthat |a,| < M for allm, then the sequence is said to be 
bounded. If a sequence is not bounded, it is said to be unbounded. 


The boundedness terms in Definition 4.5.4 read very much like those in Defini- 
tion 4.1.1. In fact, to say that a sequence is bounded from above, bounded from below, or 
bounded is precisely the same as saying that the range of the sequence is bounded from 
above, bounded from below, or bounded. Furthermore, we may apply Theorem 4.1.2 to 
the range of a sequence to have the following: 


Theorem 4.5.5. A sequence (a,) is bounded if and only if it is bounded from above and 
below. 


Example 4.5.6. From Example 4.5.1, we note without details that for 


1. (1/2 + (-1)" - 1/2) is bounded above by M; = 8 and below by Mz = 0. It is 
therefore bounded; 


2. (n + (—1)") is not bounded from above, therefore it is not bounded. It is, however, 
bounded from below by M> = 0; and 


4, The sequence defined in Eq. (4.15) is not bounded from above, thus not bounded. 
It is bounded from below by Mz = —84. 


Example 4.5.7. We show that the sequence from Example 4.5.1 defined by a, = n?/2” 
is bounded. In Exercise 12 from Section 2.4, we showed that 2” > n? for alln > 5. Thus 
0< ne (2? < 1foralln > 5. Let M = max{aj, ao, a3, a4, 1}. We show that 0 < a, < M 
foralln Ee N. 
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Pickn € N. If 1 <n < 4, then clearly 0 < a, < max{a, a, a3, a4} < M.Ifn > 5, 
then 0 < a, < 1 < M. In either case, 0 < a, < M, so that the sequence is bounded. 


What does it mean for a sequence to be unbounded? The answer: (a,,) is unbounded 
if for all M > 0 there exists n € N such that |a,| > M. 


Example 4.5.8. Show that the sequence defined by a, = ./n is unbounded. 


Solution: Pick M > 0, andletn be any integer satisfyingn > M*. Then by Exercise 8 
from Section 2.8, we have that a, = ./n > VM? =M > 0,sothatla,|>M. 


Since the sequence (x”) is decreasing for 0 < x < 1, itis bounded above by 1. Since 
every x” is positive, (x”) is bounded below by zero. Thus if0 < x < 1, the sequence (x”) 
is bounded. However, the next example shows that (x”) is unbounded if x > 1. 


Example 4.5.9. Suppose x > 1. Then (x”) is unbounded. 


Solution: Suppose (x”) is bounded, and let L be the LUB of the sequence. Let 
€ = x — 1, which is positive. Then by property M2, there exists n € N such that 
L—e <x" < L,sothat L < x" +x — 1. But since x” > 1, we have that 


ge eg ha Sa IE" Se oT SL 418) 


This is a contradiction, so that (x”) is unbounded. | 


To close out this introductory section on sequences, here’s an example that illustrates 
a technique we'll use in Section 4.6. Exercises 6-8 are similar. 


Example 4.5.10. Consider the sequence defined by a, = (n + 3)/(n — 1) forn > 2. 
Find a value of N such that 


lay — 1| < 0.001. (4.17) 
Does inequality (4.17) hold ifn > N? 
Solution: Expanding inequality (4.17) according to Theorem 2.3.19 and then solving 
for N yields the following logically equivalent inequality statements: 


—0.001 < aes —1<0.001 
N-1 


—0.001 < < 0.001 


N-1 
—0.001(N — 1) < 4 < 0.001(N — 1) 


N > —3999 and WN > 4001. 


The inequality N > 4001 is stronger than N > —3999, so we may let N = 4002. 
Furthermore, ifn > N, then 


4 
—0.001 < 0 < —— < 
n—1 N-1 


so that |a, — 1| < 0.001 foralln > N. | 


< 0.001, (4.18) 
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EXERCISES 


1. Let x > 0. Show that the sequence (x”)°° | is strictly increasing if x > 1 and strictly 
decreasing if 0 < x < 1.7! 


2. Give an example of a sequence that is both increasing and decreasing. 
3. What does it mean to say that a sequence is not increasing? 


4. Determine whether the following sequences are increasing, decreasing, or neither. 
Prove your claims. 
(a) (1 — 1/n) 
(b) (n’) 
(c) (1+ (-1)"/n’) 
5. Consider the sequence (a,,) defined by 


| 10, if n is odd, 
An = 


1—n?, ifnis even. 


Prove that (a,) is bounded from above but not from below. 


6. Let (a,) be defined as in Example 4.5.10, and let € > 0 be given. Find a value of N 
(which will be in terms of €) such that n > N guarantees |a, — 1| < €. 


7. Let (a,) be defined by a, = (n + 3)/(2n — 1), and suppose € > 0 is given. Find a 
value of N such that |a, — 1/2| < € foralln > N. 


8. Let (a,) be defined by a, = n/(n+ 1), and suppose 0 < € < 1 is given. Find a value 
of N such that |a,| > € foralln > N. 


9. Let (ay) be defined by a, = 1/(n* + 1), and suppose € > 0 is given. Find a value of 
N such that |a,| < € foralln > N.7? 


10. Suppose € > 0, and (a,) and (b,) are two sequences with the following properties. 
Ifn > Nj, then ja, — Ly| < €/2, andifn > No, then |b, — L2| < €/2. Given this 
information, you should be able to make a true statement by filling in the blanks: 

Ifn> 2? ,then|(a, +b,) — (21+ L2)| <_? (4.19) 


> 


Fill in the blanks with the appropriate numbers and then prove that the resulting 
statement is true. 


4.6 Convergence of Sequences 


One of the most important questions we can ask about a sequence of real numbers 
concerns what we call its convergence. The question has to do with whether the terms 
are “homing in on” or “approaching” some fixed real number L as n takes on larger 


21 See Exercise 9b from Section 2.4. 
22You'll have to treat the cases € > 1 and € < | separately. 
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and larger values, or perhaps increasing or decreasing without bound. In this section, we 
discuss precisely what we mean by the notion of convergence and prove some important 
basic results about convergent sequences. 

It would be misleading not to say up front that the idea of convergence created no small 
amount of controversy and distress in the historical development of mathematics. It took 
quite some time for a satisfactory definition to be devised. In developing the calculus, Sir 
Isaac Newton assumed some pretty sweeping results concerning convergence in his work, 
only to have its details placed on firm footing later. Because the idea is a bit complicated, 
we'll work our way into it a little at a time. 


4.6.1 Convergence to a real number 
a 
Figure 4.2 provides a first glimpse into the idea of convergence to some L € R. Given 
some € > 0, the €-neighborhood of L contains all the terms in the sequence beyond 
some threshold term ay. In order to visualize how you might prove that the terms of a 
sequence (a,) are converging to L, imagine the following task. Someone gives you an 
€ > 0. You then take that € value and do a quick calculation to determine some N € N 
(that will be a function of €) with the property that the Nth term and all those after it fall 
in the €-neighborhood of L. If someone else then gives you an even smaller € > 0, you 
can still find a threshold value N with the same property but it will probably be a higher 
threshold. If you are given any € > 0 and are able to find some N € N with the property 
that all terms from the Nth one onward fall in the ¢-neighborhood of L, then you will 
have shown that the sequence converges to L. 

Before we give the definition of convergence, let’s point out that you have already 
done some of this sort of work in the Exercises of Section 4.5. In Exercise 6, you showed 


Figure 4.2 A convergent sequence. 


4.6 Convergence of Sequences 


that 


4 
Ifn>1+-, then 
€ 


That is, all terms with index larger than 1 + 4/e fall in the €-neighborhood of 1. In 
Exercise 7, you showed that 


n+3 1 
2n—-1 2 


1 7 
If ~+—, th 
n> 5 + He en 


That is, all terms with index larger than 1/2 + 7/4e fall in the €-neighborhood of 1/2. 
Having worked through this, we’re ready for the definition of convergence. 


Definition 4.6.1. Suppose (a,)°° , isa sequence of real numbers. We say that the sequence 
converges if there exists L € IR such that, for every € > O, there exists N € N with the prop- 
erty thatn > N implies |a, — L| < €. That is, the sequence converges iff 


(AL € R)(We > O)(AN EN)(Wn EN)\(X = N = Ia, — L| < €). (4.20) 


If (ay) 0°, converges to L, we say that L is the limit of the sequence as 7 approaches infinity, 


and we write this as 
lim a, = L. (4.21) 
noo 

We also say that a, approaches L as approaches infinity, and write this as 


a, > L asn—> oo. (4.22) 


If a sequence does not converge, we say it diverges. 


To construct a proof of convergence is, as always, to follow the logical flow of the 
definition. A general form for such a proof looks something like this: 


Theorem 4.6.2 (Sample). The sequence (a,)°, converges to L. 
Proof: Pick € >0. Let N be any integer satisfying N > f(€) (where you have al- 
ready done the scratchwork to find what f(€) should be). Then ifn > N, we have 
that 
la, —L| = f'n) < f') <e. (4.23) 


Thus a, —~ Lasn > o. | 


Here’s a cleaned up example. 


Example 4.6.3. Show that the sequence ((n + 1)/(4n + 3)) converges to 1/4. 
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Solution: Pick e > 0. Let N be any integer satisfying N > 1/16e — 3/4. Then if 
n > N, we have 


n+1 1 1 1 
= < <€. (4.24) 
4n+3 4 16n + 12 16N + 12 
Thus (n + 1)/(4n +3) > 1/4asn > cw. a 


Ifyou wonder what inspired the statement N > 1/16¢—3/4, it’s the result of working 
backward from inequality (4.24) and solving for N to produce a logically equivalent 
inequality. This scratchwork is what makes the last step of Eq. (4.24) work. 

Specific examples of convergence are great, but we need generalized theorems ad- 
dressing convergence in order to develop a broader theory of convergence. The following 
theorems can goa long way by giving us some basic building blocks and ways of combining 
them. You'll prove the first one in Exercise 1. 


Theorem 4.6.4. The limit of a sequence of real numbers is unique. That is, ifa, > Li 
anda, — L», then Ly = Lo. 


Theorem 4.6.5. The constant sequence defined by a, = c forc € R has limit c. That is, 
limyso9 C=C. 


Proof: Picke > 0. Let N = 1. Then ifn > N, we have that 


la, — L| = |c —c| =0 <e. a 


Theorem 4.6.6. The sequence 1/n > Oasn > oo. 


You'll prove Theorem 4.6.6 in Exercise 2 by using the Archimedean property of R. 
You prove the next important theorem in Exercise 4. 


Theorem 4.6.7. A convergent sequence is bounded. 


The heart of the proof of Theorem 4.6.7 can be understood in the following way. If 
(dy) converges, its terms eventually settle down close to L, so that all of its terms beyond 
a certain point are caught in (L — 1, L + 1). Thus no matter how wildly (a,) might 
jump around before this point, there are only finitely many terms to contain. This should 
suggest a value of M to serve as a bound. 

Now for the theorems that allow us to manipulate and combine sequences and their 
limits. 


Theorem 4.6.8. Suppose limy soo dn = Ly and limy 9 by = Lz. Then 
1. limn—+oo(Qn + bn) = Li + Lo, 
2. limp — 00 (Anbn) = LL». 


The proofs of the components of Theorem 4.6.8 are classic, and it would be a shame 
for you not to discover them for yourself (with some hints, of course). You'll do that in 
Exercise 6. 
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Corollary 4.6.9. Jf limy+ood, = L andc € R, then limy (cay) = cL. 
Proof: Let b, = c for all n in Theorem 4.6.8 and apply Theorem 4.6.5. a 


Corollary 4.6.10. Ifa, — L, andb, > Lz asn > on, then (a, — bn) > (Li — L2) 
asn —> OOo. 


Proof: 
lim (a, — b,) = lim [a, + (—l)b,] = lim a, + lim (—Db,, 
n—->oo noo n—>Oo n—- Oo 


=1,+(-)Il.=L;- L. (4.25) 
|| 


With an induction argument calling on Theorem 4.6.6 and part 2 of Theorem 4.6.8, 
you can show the following (Exercise 7): 


Corollary 4.6.11. [fp € N, then 1/n? > 0asn > ov. 


To combine two sequences (a,) and (b,) by division, we first need to do a little work 
on the sequence (1/b,) by itself. We would like to be able to say something to the effect 
that the limit of a quotient is the quotient of the limits. Well, we can, sort of, with some 
preliminary work. 

The sequence (1/n) contains only positive terms. However, 1/n — 0, so every 
€-neighborhood of zero contains some 1/n. Thus, the terms of the sequence (1/7) 
cannot be bounded away from zero. However, if the terms of a sequence are all nonzero 
and converge to a nonzero limit, then we can bound its terms away from zero. 


Theorem 4.6.12. Suppose (b,) is a sequence of nonzero real numbers such thatb, > L # 
0. Then there exists M > 0 such that |b,| => M foralln €N. 


You'll prove Theorem 4.6.12 in Exercise 8. To keep things simple, you'll prove it only 
for the case L > 0. For the case L <0, we can then make the following observation. If 
L <0, then —b, — —L. Since —L > 0, we can apply the positive case of the theorem to 
(—b,) to conclude there exists M > O such that |—b,| > M foralln € N. But |—5,| = |D,|, 
so |by| > M for all n. 

Theorem 4.6.12 helps us derive the following theorem, which you'll prove in 
Exercise 9. 


Theorem 4.6.13. Suppose (b,) is a sequence of nonzero real numbers such that b, > Lo # 
0. Then limy+oo 1/by = 1/L2. 


Corollary 4.6.14. Suppose (an) is a sequence converging to L1, and (by) is a sequence of 
nonzero real numbers converging to Lz # 0. Then a, /by, > Li/L2 asn > o. 
Proof: Apply part 2 of Theorem 4.6.8 to the sequences (a,) and (1/b,). | 
Theorems 4.6.5 through Corollary 4.6.14 can take us a long way in dealing with 


certain classes of sequences. Here’s a pretty general result, one case of which you'll prove 
in Exercise 10. 
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Theorem 4.6.15. Suppose (a,) is a sequence defined by 


_ bn" +b"! +--+ + bin + bo 


(4.26) 
csn +cs_yn’-! + +++ + cin +9 


where all by, cx € IR, b, and cs are nonzero, and the denominator is nonzero for alln. Then 
an > Oifr <s, anda, > b,/cs ifr =s. 


Proof: We prove for the caser < s and leave the case r = s to you. We may take the 
expression for a, in Eq. (4.26) and multiply by n~*/n~* to have 


bn’ + byl !§ +++ + byn!$ + bon 


4.27 
Cy tes_yn7t tee tens + con ( ) 


an = 


Since r < s, every exponent of n in Eq. (4.27) is negative, and we may apply results 
of this section to obtain 


lim(b,n"* + +--+ bon) 
lim(c, + ---+ con) 

lim(b,n"~*) +--+ + lim(bon™) 
limc, +--- + lim(con~*) 


limp oo Gn = 


: : : (4.28) 
_ dim b,)(imn’~*) + «++ + (im bp) (dim n~) 
= limc, +--+ + lim(co)(limn-) 
b,-O+---+bo-0 0 


Cs +++ +c9-0 5G 


4.6.2 Convergence to +00 
|e 
Sometimes a sequence does not converge to a real number, but it does exhibit a nice 
behavior in that it increases or decreases without bound. The term we use for such 
behavior is that a, — +00 (or —0o). We have already used the somewhat loose expression 
n — ooin Definition 4.6.1, even though oo is not a real number. It is not difficult to define 
the notion of lim,—.0 dy = 00, but there is no way to do it in terms of neighborhoods, 
unless you can concoct some sort of definition of a neighborhood of infinity. We'll do 
exactly that in Section 5.4 in the context of functions from R to R, where we'll spend 
some quality time with infinity. For now, we'll be content to work with a few basic ideas. 
Ifa, — Lasn— oo, the terms a, might hop all around L, just as long as that hopping 
stays within an €-radius of L beyond some threshold term. That this bouncing around 
can be eventually contained for all € > 0 is why the bouncing apparently fades out. To say 
dn —> +00 asn — oo means loosely that the terms a, are hopping increasingly higher 
and higher, even though there is no reason the terms cannot do at least a little hopping 
back down. Not too much hopping down, though. Instead of setting an arbitrary €-radius 
around some L, we set an arbitrary hurdle bar at some M > 0 and insist that all terms 
beyond some threshold N are above the hurdle. If every M > 0 is associated with some 
threshold beyond which a, > M, then we say a, — +00 asn — oo. Although the 
terms might not be increasing to +-oo in the sense that the sequence eventually becomes 
monotone, it is in some sense making its way to -+oo. Here’s the definition. 
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Definition 4.6.16. Suppose (a,,) is asequence of real numbers. We say that lim, 0 An = 
+00 if, forall M > 0, there exists N € N with the property thatn > N impliesa, > M. 
Logically we may write this as 


(VM >0)GN EN)\Wn EN) >N >a, > M). (4.29) 


We also write a, —> +00 asn — ON, and we sometimes say that the sequence increases 
without bound. 


Example 4.6.17. Show that lim,_,.. ./n = +00. 


Solution: Pick M > Oand let N be any natural number satisfying N > M*. Then 
ifn > N, we have that /n > /N > /M?2 = M. = 


Theorem 4.6.18. If (a,) is increasing and unbounded, then a, > +00 asn > oo. 


You'll prove Theorem 4.6.18 in Exercise 14. An immediate consequence of Theo- 
rem 4.6.18 is that if x > 1, then x” — +00 asn — oo. In Exercise 1 (Section 4.5) you 
showed that (x”) is increasing for x > 1, and Example 4.5.9 showed that it is unbounded. 
With Theorem 4.6.19, it follows that if0 < x < 1, then x” — Oasn — ov. Youll prove 
it in Exercise 15. 


Theorem 4.6.19. Suppose a, > +00 asn — oo. Then 1/a, > Oasn > oo. 


EXERCISES 


1. Prove Theorem 4.6.4: The limit of a sequence of real numbers is unique. That is, if 
dn > Landa, > L», then Ly = Lo. 


2. Prove Theorem 4.6.6: limy—oo 1/n = 0. 


3. Show that 1/2” > 0asn > 00.74 


4. Prove Theorem 4.6.7: A convergent sequence is bounded.”>-”6 
5. Suppose (ay) converges to zero and (b,) is bounded. Show that a,b, > 0.77 


6. Prove Theorem 4.6.8: Suppose limy-.o0 dn = Ly and limy-. 0 by = Lo. Then 


231f LE, < Lo, then letting € = (L2 — L,)/2 should yield a contradiction. 

4Forn > 5, 1/2" < 1/n? < 1/n. 

Look at the reasoning we used in Example 4.5.7, where we took care of finitely many terms 
individually and then handled all the rest with a single number. A good start for this proof would be to 
say: “Let € = 1. Then there exists ....” 

Tet M = max{|a;|, lao|,..., |ay—1|, |L| + }. 

271f € > O and |b, | < M for M > 0, then e/M > 0, also. 
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10. 


ll. 


12. 
13. 


14. 


15. 
16. 
17. 


(a) limy+o0(dn + b,) = Ly + L528? 
(b) limysoo(Gnby) = Ly L733} 


Prove Corollary 4.6.11: If p € N, then 1/n? > Oasn — oo. 


Prove one case of Theorem 4.6.12: Suppose (b,,) isa sequence of nonzero real numbers 
such that b, — L > 0. Then there exists M > O such that |b,| > M for alln € N.*? 


. Prove Theorem 4.6.13: Suppose (b,) is a sequence of nonzero real numbers such that 


by, > Lz #0. Then limyoo 1/by = 1/L2. 
Prove the remaining case of Theorem 4.6.15: Suppose (a,) is a sequence defined by 
b,n" + b,_jn" 1 +---+ bin + bo 
j—— 
* esn’ + e5-ns-1 +--+ ein +e 


where all by, cy € R, b,, and c, are nonzero, and the denominator is nonzero for all 
n. Then a, — b,/cs ifr =s. 


Suppose a, —> L;andb, > Lz asn — oo. Suppose also that a, < b, foralln € N. 
Show that L; < L». 


Show that ifa, — L asn — oo, then |a,| > |L| asn > 00.3 


From Corollary 4.6.10, ifa, — Land b, — L, then (a, — b,) > 0. Thus, for any 
€ > 0, there exists N € N such that n > N implies |a, — b,| < €/2. Use this fact to 
prove the Sandwich theorem: If a, < Cy < by, for alln, and ifa, > Landb, > L, 
thenc, > L. 


Prove Theorem 4.6.18: If (a,,) is monotone increasing and unbounded, then a, > 
+00 asn — OO. 


Prove Theorem 4.6.19: Suppose a, — +00 asn — oo. Then 1/a, > Oasn > oo. 
Show that if —1 < x < 0, then x” > Oasn — oo. 


In the spirit of Definition 4.6.16, create a definition for the statement lim, _, 45 dy 
= —0OO. 


=—— 4.7 The Nested Interval Property 


Imagine a sequence of closed intervals ([ay, bn ])P°.,, where [an, by | C [am, Bm] whenever 
m <n. Such a sequence of intervals is said to be nested because any particular interval 
contains all those after it. In this section, we derive the nested interval property (NIP) of 


28Remember, if € > 0, then so is € /2. See Exercise 10 from Section 4.5. 

2°Given € > 0, there exists N; € N such that |a, — L;| < €/2. Similarly for b,. 

5°If L; = 0, the result follows from Exercise 5. Otherwise, do some preliminary playing around with 
the expression |a,,b,, — L,L|. The sneaky trick here is adding and subtracting the same quantity inside 
|anb, — L,L2\. If you want to know what that quantity is, check the next hint. 

31 Add and subtract b, L; and use the triangle inequality. Also, apply Theorem 4.6.7 to by. 

3?Let € = L/2. Then there exists N € N such that.... 

33See Theorem 2.3.24. 
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R, which says that a nested sequence of closed intervals whose lengths (b, — a,) tend 
to zero as n — oO will intersect down to a single real number. We then investigate an 
important implication of the NIP as it relates to bounded sequences. We also demonstrate 
that the NIP can be used axiomatically to prove the LUB property, thus making the two 
statements logically equivalent. 


4.7.1 From LUB axiom to NIP 

Ea 

If ([an, bn J) is a nested sequence of closed intervals, then clearly dy < dy < Dn < bm 
whenever m <n. In particular, aj < da, < b, < b, for alln € N. Thus, the sequence 
of left endpoints (a,) is monotone increasing and bounded above by b,, and the se- 
quence of right endpoints (b,) is monotone decreasing and bounded below by ay. As 
we begin to make our way to the NIP, we need the following theorem. You'll prove it in 
Exercise 1. 


Theorem 4.7.1. Suppose (a,) is an increasing sequence of real numbers that is bounded 
from above. Then (ay) is convergent. Furthermore, if lima, = L, thena, < L for all 
neN. 


Theorem 4.7.1 is the step in the process of proving the NIP where you will use the 
LUB axiom. The next theorem should follow quickly from Theorem 4.7.1 by the strategic 
insertion of some negative signs (Exercise 2). 


Theorem 4.7.2. Suppose (b,) is a decreasing sequence of real numbers that is bounded 
from below. Then (by) is convergent. Furthermore, if limb, = L, then b, > L for all 
neN. 


Ifthe hypothesis conditions for Theorem 4.7.1 are satisfied, we say that (a,,) converges 
to L from below, and we write a, 7 L. Similarly for Theorem 4.7.2, we say (b,) converges 
to L from above, and we write b, \, L. 

Therefore we have that a nested sequence of closed intervals ([a,,, by, ]) has the property 
thata, > L; andb, — Lz for some Ly, Lz € R. If (Lan, by ]) also has the property that 
the lengths of the intervals approach zero asn — ov, then we have lim,,.o (by —a,) = 0. 
Because (a,) and (b,) converge individually, then by Corollary 4.6.10, 


0 = lim (by —a,) = lim b, — lim a, = Lo — Ly. (4.30) 
noo noo noo 


Thus L; = L>. The last piece of the NIP puzzle asserts that this limit is the single element 
in the intersection of all the nested intervals in the sequence. Here is a statement of the 
theorem and its proof, with its last detail left to you in Exercise 3. 


Theorem 4.7.3 (Nested Interval Property). Let ([a,,, b,]) be a sequence of nested inter- 
vals with the property that limy—oo(bn — Gn) = 0. Then there exists xy € IR such that 


(oe) 


(VLan, bn] = {xo}. (4.31) 


n=1 
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Proof: By Theorems 4.7.1 and 4.7.2, and by Corollary 4.6.10, (a,) and (b,) converge 
to some L € R. By Exercise 3, N° lan, bn] = {L}. | 


Don’t think that the hypothesis condition (b, — a,) — 0 by itself allows you to con- 
clude thatlim a, = lim b,,. Convergence of (a,,) and (b,,) separately must be demonstrated 
apart from this in order to be able to apply Corollary 4.6.10. Just because (by — dn) > 0, 
no conclusion can be drawn about the convergence of either (a,) or (b,). For example, let 
dQ, =n andb, =n? +1 /n. The lengths of the intervals [a,, b,] tend to zero. However, 
the intervals are not nested, and neither (a,) nor (b,) converge individually. 


4.7.2 The NIP applied to subsequences 
[| 
In Section 4.8, we'll need some results based on taking a given sequence (a,,) and creating 
from it a new sequence by perhaps deleting some of its terms and preserving the order 
of the kept terms. The new sequence created in this way is called a subsequence of (ay). 
We introduce the term here and note what the NIP has to say about certain sequences 
and subsequences. 

Creating a subsequence of (a,) can make for a slight notational mess when it comes 
to indexing the subsequence, so let’s look at an example. Given a sequence (a,), suppose 
we want to consider the following subsequence: 


(42, Ag, 410, 440, G44, 466, «+ +)» (4.32) 


The notational dilemma can be resolved by noting that the indices 


(2, 8, 10, 40, 44, 66, ...) (4.33) 

form a strictly increasing sequence (n,)?~, of natural numbers. If we denote n; = 2, 

nz = 8,n3 = 10, etc., then the sequence (m1, 12,3,...) is a way of indexing our 
subsequence. The subsequence can then be denoted by 

(Any; Gny, Anz, Ang» a= (dn )per> (4.34) 


where k is now the counter variable. This brings us to a definition: 


Definition 4.7.4. Suppose (a,)°° , is a sequence of real numbers and suppose (x)7-, 
is a strictly increasing sequence of natural numbers. Then the sequence (dn, )¢—, is called a 
subsequence of the sequence (d,). 


Suppose you love to play darts. You play every day from now to eternity, and youre 
pretty good at it. At least youre good enough so that you always hit somewhere on the 
target. If every dart you throw leaves a tiny hole behind, what must eventually happen to 
the set of pinholes on the target as time goes on? Well, it is certainly possible that your 
darts could always land on one of a finite number of spots on the target, in which case 
you would hit at least one spot infinitely many times. But if you don’t hit any one spot 
infinitely many times, then your target will have infinitely many holes in it. If so, then the 
finite size of the target will imply that the pinholes are going to cluster in one or more 
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places. Of course, if we think of two direct hits on the same exact spot as leaving two 
distinguishable holes (why not?), then it can indisputably be said of your target that the 
holes are somewhere clustered, and around at least one spot. 

This little analogy is going somewhere: Suppose (a,) is a sequence of real numbers 
that is bounded by M > 0. Then the interval [—M, M] is the dart board target, and the 
terms a, are the spots you hit. The next theorem effectively says that there are infinitely 
many terms of the sequence clustered around some point in [—M, M], but it does so in 
the language of subsequences. It will come in handy in Exercise 4, where you will prove 
the famous Bolzano-Weierstrass theorem. 


Theorem 4.7.5. Every bounded sequence of real numbers contains a convergent subse- 
quence. 


Proof: Suppose (a,) isbounded by M and denote [co, dy] = [—M, M]. (See Fig. 4.3.) 
Let mo be the midpoint of [co, do], and note that either [co, mo] or [mo, do] must 
contain infinitely many of the a,. (Perhaps both do.) Let [ci, di] be one of the 
subintervals of [co, do] that contains infinitely many of the a,, pick any term of the 
sequence from [cj, d;], and denote it a,,. Iterating this process, and letting mm be 
the midpoint of [c,, d)], we may let [c2, d)] be whichever of [c;, m,] or [7, d,] 
contains infinitely many of the a,, then pick any term of the sequence from [c2, d2] 
whose index exceeds 1, and call this term a,,,. Continuing in this fashion, we generate 
two things: a nested sequence of intervals [cx, d,] whose lengths are 2M/2* and 
therefore tend to zero (see Exercise 3 from Section 4.6), and (dn,)7--,, a subsequence 
of (a,)°°.,, each term of which falls in [c,, dg]. 


—M 0 M 
a, TY | 
Co Mo dy 

an, 
i | 
Cy My, d, 
an, 
a eT 
Co My d, 
an, 

—$-=— 

Cz Ms; d; 

me 

rm 


Figure 4.3 Nested intervals generated in the proof of Theorem 4.7.5. 
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By the NIP, N° ,[cx, dk] = {xo} for some x9 € R. Since cy, > xp and dk — x as 
k — ow, and since cy, < dp, < dy for allk € N, the Sandwich theorem (Exercise 13 
from Section 4.6) implies that a), —> xo. | 


4.7.3 From NIP to LUB axiom 

I 

Let’s go back and note how the LUB axiom has contributed to the results developed thus 
far. We said in Section 2.8 that assumption A22 can be derived from the LUB axiom, but 
we have not proved this is true. All the results of Section 2.8 are based on A22, hence 
they depend on the LUB axiom as well. Certainly a lot of our work so far in this chapter 
has relied on the LUB axiom. Section 4.1 was all about the LUB axiom. In Section 4.2, 
Theorem 4.2.7 used the LUB axiom to show that ¥ and R are the only subsets of R that 
are both open and closed. In Section 4.5, the only places we used any results from the LUB 
axiom were in choosing the maximum value of a finite set in proving some results about 
boundedness. Because of this, the theorems from Section 4.6 involving boundedness 
stem from the LUB axiom. In what follows, we want to delete the LUB axiom from our 
list of assumptions and replace it with the NIP. We can then show as a theorem that R has 
the LUB property. To do this, we retain our terminology of sequences and convergence. 
Assuming the NIP, here is the theorem we want to prove. 


Theorem 4.7.6. Suppose A C R is nonempty and bounded from above by M. Then there 
exists L € R such that L is the least upper bound of S. 


If youre inclined to tackle the proof of Theorem 4.7.6 without any guidance, go right 
ahead (Exercise 5). Just make sure you understand what the task is. You have to use the 
NIP to prove the LUB property of R as a theorem. That is, somehow youre going to use 
what you know about A to generate a sequence of nested intervals whose lengths tend 
to zero, so that you can apply the NIP to it. If you want more guidance, read on for a 
thumbnail sketch of the proof. 

First, because A 4 #, we can choose some x € A. We then split the interval [x, M] in 
half repeatedly. At each stage, we keep the right half of the split interval if it contains any 
elements of A; otherwise, we keep the left half. (See Fig. 4.4.) The successive splits will 
produce a sequence of nested intervals whose lengths tend to zero. The NIP then gives 
us some Xo in the intersection of all the intervals, which you can show has properties L1 
and L2 from page 135 (or M1 and M2 if you prefer). 


EXERCISES 


1. Prove Theorem 4.7.1: If (a,) is an increasing sequence of real numbers that is bounded 
from above, then (a,) is convergent. Furthermore, iflima, = L, then a, < L for all 
neN.*4 


34The fact that a, < L for alln should be a mere observation based on where you obtain L. 
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ay My by 
a3 Ms b; 
I 
a, my by 
me 
rm 


Figure 4.4 Nested intervals generated in the proof of Theorem 4.7.6. 


2. Prove Theorem 4.7.2: If (b,) is a decreasing sequence of real numbers that is bounded 
from below, then (b,,) is convergent. Furthermore, if limb, = L, then b, > L for all 
neN.* 


3. Suppose ([a,, bn ]) isa nested sequence of closed intervals with the property that (a,) 
and (b,) both converge to L. Show that N°, [an, bn] = {L}. 


n=1 
4. Prove the Bolzano-Weierstrass theorem: Every bounded, infinite set of real numbers 
has a limit point in R.*° 


5. Assuming the NIP, prove Theorem 4.7.6: Suppose A C R is nonempty and bounded 
from above by M. Then there exists L € R such that L is the least upper bound of S.°7 


4.8 Cauchy Sequences 


Now we turn our attention to a certain class of sequences called Cauchy sequences. If we 
can say that convergent sequences are characterized by the fact that the terms all eventually 
bunch up around some real number L, then we would say that Cauchy sequences are 


35 Apply Theorem 4.7.1 to (—b,). 

36You can apply Theorem 4.7.5 by creating just the right sequence from the set, or you can mimic the 
proof of Theorem 4.7.5 in its use of the NIP. 

37Prove that xp € Nan, bn] has properties L1 and L2 by contradiction. 
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characterized by the fact that the terms all eventually bunch up around each other, without 
any specific reference to any possible limit in R. It turns out that this bunching of terms 
is logically equivalent convergence to some L € R (with the help of the NIP), and this 
is the main result we want to prove in this section. Once we have worked through these 
results, we discuss why Cauchy sequences are important, and we relate their convergence 
to the LUB axiom and the NIP. 


4.8.1 Convergence of Cauchy sequences 


First let’s define precisely what we mean for a sequence to be Cauchy. Then we'll show 
that a sequence is convergent if and only if it is Cauchy. 


Definition 4.8.1. Suppose (ay) is a sequence of real numbers. Then (a,) is said to be 
Cauchy if, for alle > Q, there exists N © N such that, forall m,n > N, we have that 
|dm — An| <€. 


One way to envision a Cauchy sequence in terms of some given € > 0 and the corre- 
sponding threshold N is that all terms which have an index of at least N fall within an 
€ distance, not only of ay, but of each other as well. (See Fig. 4.5.) There is no apparent 
limit around which the neighborhood is anchored, so the question of whether a Cauchy 
sequence must converge is not immediately answerable. In showing that convergence 
and Cauchy are equivalent, let’s do the easy part first. 


¢ E€ ) 


a, a4 Any2 Any Any. 4n43 45 as ay 


Figure 4.5 A Cauchy sequence. 


Theorem 4.8.2. A convergent sequence is Cauchy. 


Theorem 4.8.2 says that, if the terms of a sequence bunch up around some L € R, 
then they must bunch up around each other. You'll prove this in Exercise 1. 

What does it mean for a sequence not to be Cauchy (Exercise 2)? If a sequence is not 
Cauchy, then by Theorem 4.8.2 it does not converge. Sometimes a natural way to show 
a particular sequence diverges is to show it is not Cauchy. In Exercise 3 you'll show that 
the sequence ((—1)”) is not Cauchy and hence it diverges. 

To prove that a Cauchy sequence is convergent, we need the following theorem: 


Theorem 4.8.3. A Cauchy sequence is bounded. 


You can prove Theorem 4.8.3 in a way almost identical to the proof of Theorem 4.6.7 
where you bounded all but a finite number of terms within some neighborhood of 
the limit. Instead of fencing in the tail of the sequence around a known limit, a Cauchy 
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sequence can be bounded by working around the threshold termay fore = 1 (Exercise 4). 
At this point, you have all the machinery you need to show that a Cauchy sequence 
is convergent. Here’s a statement of the theorem. 


Theorem 4.8.4. A Cauchy sequence of real numbers is convergent. 


You'll prove Theorem 4.8.4 in Exercise 5. The next paragraph explains the approach, 
which you're welcome to skip over if you don’t want the hint. 

Ifa sequence is Cauchy, then by Theorem 4.8.3, itis bounded. Since it is bounded, then 
it contains a convergent subsequence by Theorem 4.7.5. This gives you some L € R to 
work with, and you can show the entire sequence converges to this L by using convergence 
of the subsequence and Cauchiness of the sequence. 

The real numbers are only one example of a set where we address the convergence 
of Cauchy sequences. In any set endowed with a metric (a measure of distance between 
elements), Cauchy sequences can be defined in the same way as in Definition 4.8.1 by 
interpreting |a,, — a,| as the distance between elements in that context. Any such metric 
space, as we call it, where Cauchy sequences always converge to an element in the space 
is said to be complete. 

In some situations, completeness might be taken as an axiom of the metric space. 
What we have done here is to show completeness of R as a theorem, based on the NIP, 
hence on the LUB axiom. Some authors of texts in analysis prefer to go the other way by 
assuming completeness as an axiom of R and demonstrating the NIP and LUB property 
as theorems. Of course, this means that completeness of R is logically equivalent to both 
the LUB property and the NIP, and we will show that completeness of R implies the NIP. 
By Theorem 4.7.6, completeness also implies the LUB property. 

One good reason an author might want to take completeness as an axiom is that it 
makes for an interesting way to fill in the spaces between the rational numbers with the 
irrationals and make the real number line into the smooth continuum we envision. In a 
nutshell, this is how it goes. 

Build up the rational numbers through {0, 1}, N, W, Z, and Qin the way we discussed 
in Section 2.6. Envision Q as the set of all rational points on the number line with all the 
irrationals missing. Now imagine the set of all possible sequences of rational numbers 
only. Many of the Cauchy sequences in Q will not converge in Q. There are several 
unproved assumptions in the next example, but it still illustrates this point. 


Example 4.8.5. The terms 


ao=1 

a, = 1.4 
a2 = 1.41 
az = 1.414 


ay, = 1.41421356237 
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represent the start of a sequence that converges to /2. Every term in the sequence is 
rational because it is a terminating decimal. The sequence is Cauchy, for, given € > 0, let 
N € Nsatisfy 10-% < €. Thenifm,n > N, |dn — ay| < € because a, and a, will agree 
through the Nth decimal place. 


Therefore Q by itself'is not complete. What are you going to do? If you want to assume 
completeness as a characteristic of a set containing Q as a subset, then you will have to 
throw in all possible limits of Cauchy sequences of rational numbers. With a little bit of 
work, you can fill in the holes in the number line with these irrational limits. 

Practically speaking, we apply this principle quite often. Anytime you use your cal- 
culator to give you V2, you are using a terminating decimal approximation, which, of 
course, is rational. If you need a better approximation, you may use a computer to give 
you more decimal places of accuracy. You're still working with a rational number, but 
you're counting on the internal workings of the technology and the theory of analysis to 
give you a rational number that is actually closer to the exact value. In computation, we 
always work with rational numbers, except perhaps when we manipulate symbols such 
as m1 or V2 algebraically. 


4.8.2 From completeness to the NIP 

—EEe 

If we can show that completeness of IR implies the NIP, then we will have demonstrated 
the logical equivalence of the LUB property, the NIP, and completeness. You'll supply the 
details of what remains in Exercise 6. Here are some reminders and a suggestion about 
how to proceed. 

In proving the NIP from the LUB property in Section 4.7, we supposed ([a,, bn ]) is 
a nested sequence of closed intervals whose lengths tend to zero as n — oo. One thing 
we noted is that (b, — a,) — 0 by itself does not say anything about convergence of (ay) 
or (b,) separately. Intervals [a,, b,] might become arbitrarily short, but if they are not 
nested, then nothing prevents them from waltzing forever up the real number line, so that 
neither (a,) nor (b,) converges. However, the hypothesis condition that the [a,, b,] are 
nested means that (a,) and (b,,) are bounded. Then monotonicity and the LUB property 
applied to the a, and b, put a number into our hands that was just the thing we needed 
to arrive at the NIP. 

This time, in proving the NIP from completeness, we still begin with the assumption 
that the intervals are nested and (by, — a,) —> 0. The difference this time is that the axiom 
we need to apply is completeness of R. If we can show that (a,,) and (b,) are Cauchy, then 
the assumption of completeness will give us a real number limit for each. This turns out 
to be fruitful because (b, — a,) — 0 makes |b, — a,| very small eventually. To conclude 
that (a,) is Cauchy, b, in the expression |b, — a,| can be replaced by a, for m > n to 
have |d — d,|, which is even smaller than |b, — a,| because a, and a, are at least as 
close together as a, and b,. The rest of the work in proving the NIP would be identical to 
what you did in Section 4.7, where we observed lima, = lim b,, and where you showed 
in Exercise 3 that N[a,, b,] = {L}, where L is the common limit of (a,,) and (b,,). Here is 
the only theorem we need to bridge the gap from completeness to the NIP, which you'll 
prove in Exercise 6. 
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Theorem 4.8.6. Suppose ([a,, b,]) isa sequence of nested intervals such that (by, —ay) > 0 
asn —> oo. Then (an) and (by) are Cauchy. 


To make the proof of Theorem 4.8.6 easier, all you need to do is show that (a,) is 
Cauchy, then observe that a similar argument will work for (b,). 


EXERCISES 


1. Prove Theorem 4.8.2: A convergent sequence is Cauchy. 

2. What does it mean for a sequence (a,) not to be Cauchy? 

3. Show that the sequence ((—1)”) is not Cauchy, hence diverges. 

4. Prove Theorem 4.8.3: A Cauchy sequence is bounded. 

5. Prove Theorem 4.8.4: A Cauchy sequence of real numbers is convergent. 


6. Prove part of Theorem 4.8.6: Suppose ([an, Dn ]) is a sequence of nested intervals such 
that (b, — a,) > Oasn — oo. Then (a,) is Cauchy. 
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Functions of a Real Variable 


Now we turn our attention to functions whose domain and codomain are some subset 
of R. Functions whose domain is a subset of R are called functions of a real variable, 
and functions whose range is a subset of R are called real-valued. Although the domain 
of a function of a real variable might be any subset of R, some especially powerful and 
interesting results can be deduced if S is compact. Because this chapter deals exclusively 
with real-valued functions of a real variable, we'll not state every time that the domain 
and codomain are subsets of IR. When we write f : S > R, it’s understood that § C R. 


— 5.1 Bounded and Monotone Functions 
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First we discuss boundedness and monotonicity of a function f : § > R. With these 
and other ideas from Sections 5.2 and 5.3, we'll point out many links to the properties 
of sequences we discussed in Chapter 4. 


5.1.1 Bounded functions 


The definitions of the boundedness terms for functions resemble those for sequences. 


Definition5.1.1. Suppose f : S — Risafunctionand A C S.We say f is bounded from 
above on A if there exists M; € Rsuchthat f(x) < Mj forall x € A. Similarly, we say 
f is bounded from below on A if there exists M> € Rsuch that f(x) > M2 forallx € A. 
If there exists M > O such that | f(x)| < M forallx € A, we say f is bounded on A. For 
each of these characteristics, if it applies on all of S, we say simply that f is bounded from 
above, bounded from below, or bounded. 


5.1 Bounded and Monotone Functions 


1 2 2 
Figure5.1 f(x) = 1/x. 


0 


In Exercises 1 and 2, you'll negate these definitions and apply all these terms to 
f : (0,00) — R defined by f(x) = 1/x. (See Fig. 5.1.) 


5.1.2 Monotone functions 

at 

When we defined a sequence to be monotone increasing, the fact that the terms of the 
sequence are lined up in an order made the definition easy to come by—if we pick two 
indices m < n, it must be that a,, < a,. A similar definition works for functions. 


Definition 5.1.2. Suppose f : S > Risa function, A C S, anda,b € A. Then f is 
said to be increasing on A if f(a) < f(b) whenevera < Db. Similarly, f is said to be 
decreasing on A if f(a) > f(b) whenevera < b. lf f(a) < f(b) whenevera < b, f is 
said to be strictly increasing on A. If f(a) > f(b) whenevera < Db, f is said to be strictly 
decreasing on A. For each of these characteristics, if it applies on all of S, we say simply that 
f is increasing, decreasing, strictly increasing, or strictly decreasing, and we group functions 
of these types into the class we call monotone functions. 


Go back and take a look at Exercise 9 from Section 2.4. Though we did not have the 
language of monotonicity at the time, this exercise proved the following. 


Theorem 5.1.3. Letn € N. Then: 
1. f(x) = x7""! is strictly increasing. 


2. f(x) = x?” is strictly decreasing on (—0o, 0] and strictly increasing on [0, 00). 
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In your precalculus work, you probably learned about inverse functions and whether 
a given function f even has an inverse by asking whether it passes the horizontal line test. 
The horizontal line test is an intuitive way of determining whether f is one-to-one, for if 
every horizontal line in the xy-plane crosses the graph of f no more than once, then f 
is one-to-one. Whether f is onto R does not matter, for you can crop the codomain of f 
down to Rng f so that f : S > Rng f is onto. Thus if f : S > R is one-to-one, then 
f—' : Rng f > S will exist. Given that f~! exists, then you were probably told that its 
graph could be sketched by reflecting the graph of f about the diagonal line y = x. You 
were taught to find the formula for f~! by switching the roles of x and y, then solving 
for y, and that reflecting the graph about y = x is the visual result of swapping x and 
y. Think intuitively for a moment. If f is a strictly increasing function, then does it pass 
the horizontal line test? If so, then f—! exists on Rng f. Given this, can you say anything 
about monotonicity of f~'? If you have the answers to these questions, then you have 
Theorems 5.1.4 and 5.1.6. You'll prove them in Exercises 4 and 5. 


Theorem 5.1.4. If f is strictly increasing (or strictly decreasing) on A, then it is one-to-one 
on A. 


As an immediate consequence of Theorems 5.1.4 and 3.2.5, we have the following: 


Corollary 5.1.5. If f:S — R is strictly monotone on A C S, then there exists f—! : 
f(A)-> A. 


Theorem 5.1.6. If f:S — R is strictly increasing (or strictly decreasing), then f—' : 
Rng(f) — S is strictly increasing (or strictly decreasing). 


Theorems 5.1.3 and 5.1.6 imply that %/x is strictly increasing on [0, 00) and *"\V/x is 
strictly increasing on R. In addition, with Theorem 5.1.7, which you'll prove in Exercise 6, 
the function f(x) = x" is strictly increasing on [0, oo) for any r € Qt. We may write 
r=m/n form,n €N, so that f(x) = /x™. 


Theorem 5.1.7. Suppose f and g are real-valued functions such that g o f is defined on 
some interval (a, b). If f and g are both increasing, then so is go f. 


EXERCISES 


1. Use the logic of Definition 5.1.1 to state what the following terms mean: 
(a) f is not bounded from above on A 
(b) f is not bounded from below on A 
(c) f isnot bounded on A 
2. For f(x) = 1/x on the interval (0, oo) show the following: 
(a) f is bounded on (1, co) 
(b) f is bounded from below on (0, 1) 
(c) f isnot bounded from above on (0, 1) 
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3. Show that f(x) = 1/(x? + 1) is strictly decreasing on [0, 00). 


4. Prove part of Theorem 5.1.4: If f is strictly increasing on A, then it is one-to-one 
on A. 


5. Prove part of Theorem 5.1.6: If f : S > Ris strictly increasing, then f~': Rng(f) > 
S is strictly increasing. 


6. Prove Theorem 5.1.7: Suppose f and g are real-valued functions such that g o f is 
defined on some interval (a, b). If f and g are both increasing, then so is go f. 


7. State and prove other theorems analogous to Theorem 5.1.7 by varying the hypothesis 
conditions on f and g to include all combinations of increasing and decreasing. 


5.2 Limits and Their Basic Properties 


Suppose f : S — R is a function that is defined on some deleted neighborhood of 
a € R. Whether f(a) exists is beside the point in the definition of limit, so we insist on 
no more than a deleted neighborhood. In a way somewhat similar to our definition of 
convergence of sequences as n — 00, we discuss lim,_.q f(x), the limit as x approaches 
a of f (x). In this section, we'll work our way into the definition of the limit of a function, 
then look at some basic theorems involving limits that bear striking parallels to those 
theorems involving sequences. 


5.2.1 Definition of limit 

[| =— =. ] 

Let’s work our way into a definition of lim,_., f(x) gradually. Let f(x) be defined in 
the following way: 


_ 4x—-1, ifx #2 
f@)= ee ifx = 2. OP 


Defining f in this way produces a very familiar straight line, but with a hole punched in 
the graph at x = 2 and the value of f(2) artificially (but purposefully) defined to be a 
number way out of line with what you probably expect it to be. (See Fig. 5.2.) Although 
f (2) = 50, the function does not take on values anywhere near 50 if x is close to but 
different from 2. To the contrary, if x is in some small deleted neighborhood of 2, f (x) 
appears to take on values close to 7. In fact, we can ask the following question: 


Example 5.2.1. Let f be defined as in Eq. (5.1) and suppose € > 0 is given. For the 
€-neighborhood of 7 on the y-axis in Fig. 5.2, we want to find a radius 6 > 0 of a deleted 
neighborhood of 2 on the x-axis so that every x € DN5;(2) maps into N,(7). That is, we 
want to find 6 > 0 so that 


T—e< f(x) <7+e (5.2) 


for every x € DN5(2). 
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40-5 


Figure 5.2 Shows f defined from €q. (5.1). 


Solution: Since the only values of x we’re interested in are different from 2, we may 
use f (x) = 4x — 1 in inequality (5.2). Thus we have 


T—€<4x-1<7+e (5.3) 
€ € 

2-- 2+-. 5.4 
ide +7 (5.4) 


Inequalities (5.3) and (5.4) are logically equivalent, sox € DNe/4(2) implies 7—€ < 
f(x) < 7+ €. Letting 5 = €/4 guarantees that ifx € DN5(2), then | f(x) — 7| <. 
| 


Example 5.2.1 is effectively the determination of a deleted 6-neighborhood of 2 that 
maps into a given €-neighborhood of y = 7. The value of 6 depends on the given value 
of €. It illustrates the following point, which is the heart of the definition of lim,_,g f(x): 
Given a “target” of radius € around y = 7, we can find a deleted neighborhood of x = 2 
such that all values of the function in the deleted neighborhood hit somewhere in the 
target. If you imagine a sequence of smaller and smaller €-values, it is always possible 
to find sufficiently small deleted neighborhoods of x = 2 that map entirely within 
the smaller and smaller targets. If such a deleted 5-neighborhood can be found for all 
e-neighborhoods of y = 7, then we say that lim,_,2 f(x) = 7. Here’s the definition. 


Definition5.2.2. Suppose f : S — Ris defined onsome deleted neighborhood ofa € R, 
and L € R. Thenwe say lim,_,, f(x) = Lifforalle > 0, there exists 6 > O such that 
forall xeS,0 < |x —a| < d implies | f(x) — L| < e. If lim,., f(x) = L, we say f 
converges to L as x —> a. Another way to write this is f(x) > Lasx — a. 


To write Definition 5.2.2 symbolically, we would have 


lim f(x) = L & (We > 0)(45 > 0)(VxeS)O < |x —a| <5 > | f(x) -L| <€). 
: (5.5) 


5.2 Limits and Their Basic Properties 


Here’s an example of a proof based on Definition 5.2.2. Some scratch work has been 
omitted that leads to the value of 5. See if you can figure out what 6 should be on your 
own. 


Example 5.2.3. Let g be defined by 


3x2 + 17x + 20 
= 5. 
g(x) ran (5.6) 


Show that lim,_,_4 g(x) = —7/2. 


Solution: Pick e > 0. Let 6 = 2€/3. Now if x # —4, a factor of (x + 4) may be 
canceled from the numerator and denominator of g. Thus if0 < |x +4| < 6, we 


have 
7 3x27+17x+20 7 
Jew +3] = 2x +8 4 
_ | Gx+5)@ +4) , 
2(x +4) 2 ies 
3x+5 7 3x +12 
Silane 4 7 here | 
= ser = Stax Pe 
2 2 2 
Thus lim,,_4 g(x) = —7/2. = 


In the same way that f from Eq. (5.1) seems a little contrived because of the special 
effort we took to define f (2) = 50, g from Example 5.2.3 might seem contrived because 
nothing would seem to prevent us from canceling the (x + 4) factor from Eq. (5.6). 
Granted, g is merely the straight line y = 3x + 5 with a hole punched in the domain 
at x = —4 by the introduction of an (x + 4) factor in the numerator and denominator. 
But f and g are good first examples to illustrate that the limit of a function as x > a 
has nothing to do with the value or even the existence of f(a). Whether f(a) exists and 
is the same as the limit as x — a is called continuity, and we'll look at that in Section 5.5. 
There are plenty of examples where a hole in the domain occurs naturally, and we'll look 
at one later in this section. 


5.2.2 Basic theorems of limits 

SZ 

There is a real similarity between the definition of convergence of a sequence limy- 40 Gn 
and convergence of a function lim,_,, f(x). In convergence of sequences, a given € > 0 
must be associated with a threshold term beyond which all terms of the sequence fall in 
the €-neighborhood of the limit. In the convergence of a function, a given € > 0 must 
be associated with a 5-radius of a deleted neighborhood of a, within which all values 
of the function, except perhaps f(a), must fall in the €-neighborhood of the limit. The 
good news is that the similarity of these definitions makes for plenty of similar theorems 
involving lim,_.~ f(x), with similar proofs to match. Here is a barrage of theorems 
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involving convergence of a function. As we present them, we’ll often link them back 
to sequence theorems and properties. We'll prove some of them here to illustrate the 
similarities, but you'll prove most of them in the exercises by mimicking your work from 
Section 4.6. 


Theorem 5.2.4. For any ae€R, the constant function f(x)=c satisfies limy+¢ 


f@)=c. 
Proof: Pickanya € Rande > 0. Leté = 1. Thenif 0 < |x —a| < 6,|f(x) —cl = 
lc—c| =0 <e. Thus lim,_., f(x) =c. | 


Theorem 5.2.5. Foranya € R, lim,.,x =a. 


Theorem 5.2.6. If f(x) — L asx — a, then there exists a deleted neighborhood ofa where 
f is bounded. 


Theorem 5.2.7. If f(x) > 0 asx — a, and if g is bounded on a deleted neighborhood 
of a, then f (x)g(x) > Oasx > a. 


Theorem 5.2.8. If f(x) > L, and g(x) > Lz asx > a, thenasx > a: 
1. f@)+ 8g) > Li +L», 
2. f(x)g(x) > LiLo. 
Proof: We prove part 1 and leave part 2 to you. Pick e > 0. Since f(x) > Lj as 
x — a, there exists 6, > 0 such that 0 < |x — a| < 4, implies | f(x) — Li| < €/2. 
Similarly, since g(x) — Lo, there exists dy > 0 such that 0 < |x — a| < 6 implies 


lg(x) — La| < €/2. Let 6 = min{d,, 52}. Then 6 > 0, and if 0 < |x —a| < 6, both 
0 < |x —a| < 6; andO < |x —a| < &, are satisfied. Thus we have 


ILF (x) + g(@)] — (Li + La) I] < Lf @) — Lil + lg) — Lal < 5 + 


€ 
Pei. tom 
pig ee Be) 


|| 
Three results follow as immediate corollaries of Theorems 5.2.4 and 5.2.8. 


Corollary 5.2.9. If f(x) > L asx — a, andifc € R, thencf (x) > cL asx > a. 


Corollary5.2.10. If f(x) > L, andg(x) > Lz asx > a, then f (x)—g(x) > L\—-L» 
asx —> a. 


Corollary 5.2.11. If P(x) = anx” + dy_1x""! + +++ + a,x + ao, then for alla € R, 
P(x) > P(a) asx > a. 


Theorem 5.2.12. Suppose g(x) > L #4 0 asx — a. Then there exists a deleted neigh- 
borhood of a where g can be bounded away from zero. In particular, if L > 0, there exists 
M > Oand6é > O such thatO < |x —a| < 6 implies g(x) > M.IfL < 0, there exists 
M <0Oand6é > 0 such thatO < |x —a| < 6 implies g(x) < M. 


Theorem 5.2.13. Suppose g(x) > L 4 Oasx — a. Then 1/g(x) > 1/Lasx > a. 


Corollary 5.2.14. Suppose f(x) > Ly and g(x) > Lo # 0 asx — a. Then f (x)/ 
g(x) > L,/L2 asx > a. 


5.2 Limits and Their Basic Properties 


Theorem 5.2.15. Suppose f is defined by 


= P(x) _ ax" + a,x"! +--+ +ayx +a 
Pax) Bsx8 + Bp xb fo + Bix + bo! 


f(x) 


Then, provided Py(a) #0, f(x) > Pi(a)/P2(a) asx > a. 
Proof: Apply Corollaries 5.2.11 and 5.2.14 and the result follows. H 


Similar to the comments on sequences in Exercise 13 from Section 4.6, if f(x) > L 
and g(x) > Lasx — a, then f(x) — g(x) — 0. Thus if € > 0, there exists 6 > 0 
such that 0 < |x — a| < 6 implies | f(x) — g(x)| < €/2. This fact will come in handy in 
proving the following: 


Theorem 5.2.16 (Sandwich Theorem). Suppose f, g, andh are functions with the prop- 
erty that g(x) < h(x) < f(x) forall x in some deleted neighborhood of a. Suppose also 
that f (x) > L and g(x) > Lasx > a. Thenh(x) > Lasx > a. 


Earlier we promised an example of a function with a natural hole in the domain 
where we want to address the limit. The function is f(x) = sinx/x ata = 0. Except 
for a passing glance at sine in Section 4.5, we have never mentioned any trigonometric, 
exponential, or logarithmic functions. There is a reason for this; definitions of these 
functions that are rooted in the axioms of R are not possible to come by at this stage 
of our game. Only the algebraic functions arise from the theory we’ve developed thus 
far, and they can make for some pretty sticky proofs all by themselves. Strict definitions 
of these nonalgebraic functions, called transcendental functions, come later. However, if 
we kick back for a while and give ourselves the freedom to talk about sine, cosine, and 
tangent in the familiar language of the unit circle, then we can apply Theorem 5.2.16 to 
show 

im oe (5.9) 
x>0 X 

In trigonometry, we take a real x-numberline and wrap it, as it were, around the unit 
circle in the uv-plane with x = 0 placed at (u, v) = (1, 0) and the positive half of the 
x-axis wrapped counterclockwise. For x € IR, cos x and sin x are defined to be the u and 
v coordinates, respectively, of the point where x falls in the uv-plane. If0 < x < 1/2, 
Fig. 5.3 shows how we may view x as an arc length, and sinx, cosx, and tanx as the 
lengths of segments in Quadrant I. Similar triangles in the figure will convince you that 
the length labeled tan x is correct. If0 < x < m/2 as is suggested in the sketch, the 
geometry of the unit circle reveals 


sinx <x <tanx. (5.10) 


If —1/2 < x < 0so that x is in Quadrant IV, then sin x and tan x are also negative, and 
we have 


tanx <x < sinx. (5.11) 
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Figure 5.3 Basic trigonometric 
functions. 


If we take both parts of inequality (5.10) separately and solve each for sin x/x, we can 
put them back together into the single inequality 
sin x 

cosx < e <i. (5.12) 
If we do the same thing for (5.11) remembering that x < 0 and sinx < 0, we also arrive 
at (5.12), so that (5.12) holds for all x € DN, /2(0). Let’s accept from the geometry of the 
unit circle and the definition of cos x that cos x + | asx — 0. Then, by Theorem 5.2.16, 
Eq. (5.9) holds. 

Let’s remind ourselves of some of the theory of Cauchy sequences in order to motivate 
our last theorem from this section. An immediate result of convergence of a sequence is 
that it is Cauchy (Theorem 4.8.2). Ifthe terms of a sequence cluster around some L € R, 
then they must cluster around each other. Similarly, we can prove the following theorem, 
which states if f(x) — L asx — a, then the values of f must not vary much from each 
other in small deleted neighborhoods of a. You'll prove it in Exercise 9. 


Theorem 5.2.17. Suppose f(x) > L asx — a. Then for everye > 0, there exists 5 > 0 
such that x1, x2 € DNs(a) implies | f (x1) — f (x2)| < €. 


If a sequence is not Cauchy, it is not convergent. In Exercise 6, Section 4.8, you stated 
what it means for a sequence not to be Cauchy: There exists € > 0 such that, forall N € N, 
there exist m,n > N where |a,, — a,| => €. That is, there is some fixed € > 0 so that 
no matter where in the sequence you put your finger, somewhere farther down the line 
there will be two terms that are at least € apart. What does it mean for the property in 
Theorem 5.2.17 not to hold? The answer: There exists € > 0 such that for all 6 > 0, there 
exist Xx}, X2 € DN;(a) with | f (x,) — f(x2)| = €. That is, there is some fixed € > 0 such 
that every deleted 5-neighborhood of a, no matter how small, contains two points whose 
functional values are at least € apart. Ifa function f has this property, then lim,_., f(x) 


5.2 Limits and Their Basic Properties 


does not exist. 


Example 5.2.18. Let f be defined in the following way: 


1, ifx is rational, 


iah= to if x is irrational. OF) 


Ifa € R, then lim,_,, f(x) does not exist. For we may let € =1/2 and choose any 
5 > 0. Then by Exercises 7 and 8, Section 4.1, the deleted 5-neighborhood of a contains 
a rational number x, and an irrational number x2, and | f(x) — f(x2)| = 1 > e. This 
function is sometimes called the salt and pepper function because its values of 0 and 1 are 
sprinkled up and down the domain like grains of salt and pepper. No one really knows 
which is the salt and which is the pepper. 


EXERCISES 


1. Prove Theorem 5.2.5: For anya € R, lim,.,x« =a. 


2. Prove Theorem 5.2.6: If f(x) — Las x — a, then there exists a deleted neighbor- 
hood of a where f is bounded. 


3. Prove Theorem 5.2.7: If f(x) — Oas x — a, and if g is bounded on a deleted 
neighborhood of a, then f(x)g(x) > Oasx > a. 


4. Prove part 2 of Theorem 5.2.8: If f(x) — Li and g(x) — Lz asx — a, then 
F(x)g(x) > LiLo. 


5. Prove Theorem 5.2.12: Suppose g(x) > L 4 0asx — a.Then there exists a deleted 
neighborhood of a where g can be bounded away from zero. In particular, if L > 0, 
there exists M > 0 and 6 > 0 such that 0 < |x —a| < 6 implies g(x) > M. If 
L <0, there exists M < Oand6 > Osuch that 0 < |x — a| < 6 implies g(x) < M. 


6. Prove Theorem 5.2.13: Suppose g(x) > L #4 0 asx — a. Then 1/g(x) > 1/L as 
xa. 


7. Prove Theorem 5.2.16: Suppose f, g, and h are functions with the property that 
g(x) < h(x) < f(x) for all x in some deleted neighborhood of a. Suppose also that 
f(x) > Land g(x) ~ Lasx > a. Thenh(x) > Lasx => a. 


8. Assuming —1 < sinx < 1 forall x € R, show that lim,_,9 x sinx = 0. 


9. Prove Theorem 5.2.17: Suppose f(x) — L asx — a. Then for every € > 0, there 
exists 6 > 0 such that x;, x. € DN5(a) implies | f (x1) — f(%2)| < €. 


10. The signum function, sgnx : R\{0} — R is defined in the following way: 


—l, ifx <0, 
sens = | 1. tes. Oe) 


Show that lim,_,9 sgn x does not exist. 


11. Use familiar values of sin x to show that lim,_,9 sin(1/x) does not exist. 
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— 5.3 More on Limits 


In this section, we consider two topics relating to the limit of a function. First, we discuss 
one-sided limits, where we consider x approaching a either from the right or from the 
left separately. Then, instead of merely commenting on parallels between convergence of 
sequences and convergence of a function, we'll actually tie the two ideas together with 
an important theorem. 


5.3.1 One-sided limits 

a 

In defining lim,_., f(x) = L, we insisted that f be defined on both sides of a and all 
points to the nearby left and right of a map into N, (L). If f is defined only on one side 
of a, or perhaps if values of f to the immediate left of a behave differently from those 
to the immediate right (as in sgn.x), we can discuss the limit of f(x) as x approaches a 
from the left or from the right separately. Instead of using entire deleted neighborhoods 
of a, we use only the left- or right-half of them. 


Definition 5.3.1. Suppose f is defined on some interval (a,b). Then we say that 
lim,sa+ f(x) = L (read “...as x approaches a from the right”) if for alle > O, there 
exists 6 > O such that for all xe(a, b), a < x < a+ implies |f(x) — L| < €; Lis 
called the right-hand limit of f at a. Similarly, if f is defined on some interval (c, a), we say 
that lim,_,,- f(x) = L (x approaches a from the left) if for alle > O, there exists 5 > O 
such that xe(a, b), a — 5 < x < aimplies| f(x) — L| < €; L is called the left-hand 
limit of f ata. 


Theorem 5.3.2. Fora function f,lim,., f(x) = L if and only if 
lim, f(x) = lim f(x) = L. (5.15) 


Theorem 5.3.2 can be argued in an interesting way from the logical structure of the 
definitions of the limits involved, thus we'll provide a rather conversational argument. 
In arguing the => direction, we would pick an € > 0 and note that there exists 5 > 0 
such that x € DN5(a) implies | f(x) — L| < €. However, the inequality x € DN5(a) is 
weaker than either a < x <a+6ora—6 <x <a. Thus the statement 


x € DN;(a) > | f(x) —L| <€ 
is stronger than either 
a<x<a+é6>|fQ)-L| <e 
or 
a-8<x<a>|f(x)-L| <e. 


Thus the existence of lim, f(x) implies the existence of lim,a+ f(x) and 
lim, a- f(x). 


5.3 More on Limits 


In arguing <, for a given € > 0, the right-hand limit provides us with 6), and 
the left-hand limit provides us with 62, each having the requisite properties. If we let 
5 = min{6,, dz}, then x € DN5(a) implies either 


a<x<at+éd<a+6,; or a—d:<a—6 <x <a, 


so that | f(x) — L| < € in either case. 


Example 5.3.3. Let f be defined in the following way: 


x?7-1, ifx <2, 


f@m= 1 (5.16) 


ifx > 2. 


Ke 1 
By Theorem 5.2.15, lim,_.2(x? — 1) =3 and lim,_,2 1/(x — 1) =1 (in the two-sided 
sense). Thus, by the > direction of Theorem 5.3.2, lim,_,2- (x? — 1) = 3 and lim,_,>+ 
1/(x — 1) = 1. However, lim,_,.- f(x) = 3 and lim,_,+ f(x) = 1, so that lim,_,2 f(x) 
fails to exist by the < direction of Theorem 5.3.2. 


5.3.2 Sequential limit of f 

a 

The following theorem states that our €-6 definition of lim,_., f(x) = L from Defini- 
tion (5.2.2) is logically equivalent to a characteristic involving sequences. 


Theorem 5.3.4. Suppose f : S — R isa function. Then lim,_,, f (x) = L if and only if 
every sequence (an) with the properties thata, — a anda, # a for alln € N also has the 
property that f (an) > L. 


Youll prove Theorem 5.3.4 in Exercise 3. The => direction is the easier one, for if 
entire 6-neighborhoods of a map within € of L, then the fact that a, — a will certainly 
guarantee that all terms of (a,) beyond some threshold map within € of L also. Proving 
< is probably best done by contrapositive. Iflim,_ f(x) 4 L (What does this mean’), 
then you should be able to use a sequence of progressively smaller and smaller 5-values 
to create a sequence (a,,) such that a, ~ a for alln, a, > a, and f(a,) & L. 

The logical equivalence of our €-6 definition of limit and the sequential limit charac- 
teristic in Theorem 5.3.4 means that it is possible to define the statement lim,_,, f(x) = 
L either in terms of € and 6 or in terms of sequences. Some authors prefer to define 
function limits sequentially by making Theorem 5.3.4 their definition of the limit of a 
function. Our €-6 definition would then be a theorem. They then construct proofs of 
theorems such as those from Section 5.2 from the theory of sequences we discussed in 
Section 4.6. If you have the theory of sequences already under your belt, function limit 
proofs become very easy. For example, to show that if f(x) > L, and g(x) > Lz as 
x — a, then f(x) + g(x) > Li + Lz, the proof would go something like this. 


Proof of Theorem 5.27: We show that every sequence (a,) such that a, 4 a for all 
n anda, — a also satisfies f (dn) + g(an) > Li + Lo. Suppose a, — a, where 
da, % a for all n. Then since f(x) > L , asx — a, we have that f(a,) > Ly; as 
n — oo. Similarly, since g(x) > L asx — a, then g(a,) > Lz asn > ov. By 
Theorem 4.6.8, f (Gn) + g(an) > Li + Lo. Since (a,) was chosen arbitrarily, we 
have f(x) + g(x) > Li} + Ly asx > a. | 
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Perhaps you feel a little cheated at this point. After all, if only we had proved Theo- 
rem 5.3.4 at the beginning of Section 5.2, then all our proofs in Section 5.2 would have 
been one-liners. There is some truth to that. However, €-5 proofs pervade mathematics, 
and the fact that your first €-5 proofs could be done easily by merely mimicking work 
from Section 4.6 was probably a humane way for you to be introduced to them. 


EXERCISES 


1. Let f be defined in the following way: 


_ J2sgnx, ifx <1, 
EONS fe +2, ifx>1. ly) 


Evaluate the following, with reference to applicable theorems: 


(a) limy+o- f(x) 
(b) lim, +o+ f(x) 
(c) lim, +0 f() 
(d) lim,1- f(x) 
(e) lim, +1+ f(x) 
(f) lim, f(x) 
2. Show that lim,_.9+ ./x = 0. 


3. Prove theorem 5.3.4: Suppose f : S > R. Then lim,_,, f(x) = Lifand only ifevery 
sequence (a,) such that a, 4 a for alln € Nanda, — a satisfies f (a,) > L.! 


—— 5.4 Limits Involving Infinity 


Now we want to take the concepts and language of limits and extend them to include the 
symbols oo, even though oo is not a real number. We have used the symbol oo before 
in our work with sequences, and when we defined limy_,o0 dn, we probably raised no 
eyebrows because we defined precisely what we mean by lim,-,50 Gd, in terms of N, the 
domain of the sequence. In this section, we want to take the expression lim, f(x) = L 
and replace either a or L (or both) with one of the symbols too. Up until now, the 
fact that a and L are real numbers allowed us to discuss €-neighborhoods of L and 
6-neighborhoods of a. With our current definition of neighborhood, it makes no sense to 
talk about a neighborhood of infinity, unless of course we decide to give this term meaning. 
We will do precisely this. In fact, as we go, we'll concoct extended, new meanings for old 
language and revamp some of our visual imagery to show that there just might be some 
way to understand oo so that we can almost treat it as a number. 


'Prove < contrapositively by using a sequence 5, = 1/n to generate a sequence a, where a, — a and 


f (Qn) & L. 


5.4 Limits Involving Infinity 


The language we'll use in replacing a or L with +00 will go something like this. In 
discussing lim,—.+.0 = L, we'll call these limits at positive or negative infinity. When we 
discuss lim, f(x) = +00, we'll call these limits of positive or negative infinity. Graph- 
ically, you can imagine that a limit L at infinity corresponds to a horizontal asymptote in 
the graph of f, and a limit of infinity at a corresponds to a vertical asymptote. There are 
buckets and buckets of ways to combine and specialize the ideas we'll discuss here. With 
both +00 and —oo, and with either a or L or both being replaced by too, with two-sided 
and one-sided limits, there are several new terms we can define. By hitting a few, you will 
probably catch on to what the others ought to be, so we'll not address all of them. 


5.4.1 Limits at infinity 

Le 

Let’s begin by replacing a with +00 because it almost exactly replicates the theory of 
sequences. Since a sequence (d,,) is really a real-valued function whose domain is N, then 
our way of graphing a sequence as we did in Fig. 4.1 makes convergence to L easily visu- 
alizable as a horizontal asymptote of discrete points in the plane. If we imagine filling in 
values of the function to other real numbers so that the domain is some interval (a, +00), 
then the following definition seems to be a natural adaptation of Definition 4.6.1. 


Definition 5.4.1. Suppose f :.S — Ris defined on some interval (a, +00). Then we say 
lim, 40 f(x) = Lif for all € > 0, there exists M > Osuch that x > M implies | f(x) — 
L| <e. 


Notice the similarity between Definitions 5.4.1 and 4.6.1. Given € > 0, there exists 
a threshold point in the domain beyond which all values of the function fall within € 
of L. A little thought will convince you that this definition gives rise to a whole slew of 
theorems involving limits of functions at -+oo. 


Theorem 5.4.2. lim,.i.0¢ =. 


Theorem 5.4.3. limy.1. 1/x =0. 


Theorem 5.4.4. If f(x) > L, andg(x) > Lz asx — +00, then f(x)+ g(x) > Li+ 
Lz and f(x)g(x) > LiL2 asx > +00. If Lo # 0, then f(x)/g(x) > Li/L2 as 
x7>4+0. 


On and on the theorems go that exactly parallel our previous work. The limit of 
polynomial over polynomial and even a sandwich theorem seem strangely translucent. 
The logic of the proofs is almost identical to that from before, so we will not ask you 
to supply all the proofs in full detail. However, you should always make sure you're able 
to provide them when necessary. Do we need to bother defining lim,_,-.. f(x) = L? 
Perhaps an adaptation of Definition 5.4.1 is so clear that we do not need to state it 
formally here. And, of course, all the theorems follow. 

Now let’s extend our language of neighborhood to include infinity. When we say 
f(x) ~ Lasx — a, we mean that any neighborhood of L has a corresponding deleted 
neighborhood of a that maps into it. Is there a way to use the same language for the 
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(5y) 


1 1 a 
Figure 5.4 Rwith the point at infinity. 


statement f(x) —> L as x > + 00? Could we say that any neighborhood of L has a 
corresponding neighborhood of +-oo that maps into it? Of course we could, if we were 
to define a neighborhood of +00 to be an interval of the form (M, +00). Similarly, a 
neighborhood of —oo could be defined as an interval of the form (—oo, M). 

We're catching a glimpse of the extended real numbers, and developing an imagery of 
two phantom points oo somewhere way off the left and right ends of the numberline. 
An apparent difference is that neighborhoods of +-00 or —o are not two-sided. If we use 
the single symbol oo instead of both +too, then we can create a nice way of visualizing 
IRU {oo}, the extended real numbers, where +-oo and —oo are merged into the one point. 
Here’s how. 

Imagine a real numberline with a circle sitting on top of it as in Fig. 5.4. Imagine 
each a € R being mapped to a corresponding point (x, y) on the circle with the help of 
the diagonal line illustrated in the figure. This geometric way of mapping all points in R 
to points on the circle suggests a one-to-one function from R onto all points of the circle 
except the north pole (0, 1). If you want an explicit formula for the coordinates of the 
point (x, y) in terms of the value of a € R, you can obtain it easily enough with the help 
of similar triangles (Exercise 1), but it is not necessary to understanding the principle. 
The point is that R is now visualized in a different way. Instead of R being illustrated as 
an infinite line, it’s now a circle, except that there is one point of the circle that has not 
been associated with a real number in this mapping. Let’s call the point at (0, 1) the point 
at infinity, oo. The extended real numbers then become R U {oo}, where -+-oo and —oo 
are really the same point, and 00 is nothing more than a formal symbol thrown in, along 
with a bunch of rules about how youre supposed to use it. 

With this imagery, a natural definition for a neighborhood of oo would seem to be 


Nu (oo) = (—oo, —M) U (M, +00) = {x : |x| > MY}, (5.18) 


even though such a neighborhood does not have radius M. Limits as x — +00 or 
x —> —oo then look like one-sided limits: 


Jim f() = lim f(x) and tim, f(x) = lim, f(a). (5.19) 


Thus we have arrived at the point where lim,_,, f(x) = L has meaning forall L € R 
and alla € RU {oo}. The only thing we have to keep in mind is that the neighborhoods 
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of oo are not 6-neighborhoods as they are for real numbers in our standard view of R. 
However, if we view R U {oo} as the circle, and as long as the idea of a neighborhood 
of oo is properly understood, then all the theorems will look exactly the same. After all, 
we concocted the definition of a neighborhood of oo in order to make that happen. In 
this rather novel view, the notation lim,_.4) f(x) = L would mean f has a horizontal 
asymptote at y = L both to the left and right. 

The extended real numbers are not usually the context for a real analysis course. Also, 
the point at infinity is usually reserved for a course in complex analysis, where the entire 
xy-plane is mapped to a sphere sitting on (0, 0) in a way analogous to what we described 
here in two dimensions. The north pole of the sphere becomes the point at infinity. Keep 
in mind when you see the expression x — oo in the context of the real numbers that it 
is generally understood to mean x — +00. 


5.4.2 Limits of infinity 


Now let’s replace L with +00 in the expression lim,_.¢ f(x) = L. 


Definition 5.4.5. Suppose f is defined on some deleted neighborhood of a € R. We say 
lim, +¢ f(x) = +00 if forall M > O, there exists 6 > Osuchthat0 < |x —a| < 6 
implies f(x) > M. 


Definition 5.4.5 is a two-sided definition, so the graph of a function f for which 
lim, a f(x) = +00 will have a vertical asymptote at a, both sides of which head upward. 
Definitions for lim,_,, f(x) = —oo and corresponding one-sided limits should be clear, 
so we won't bother to state them here. However, you will in Exercise 6. 


Example 5.4.6. Show lim,_,9 1/x? = +00. 


Solution: Pick M > 0. Let 5 = 1//M. Then if 0 < |x| < 4, it follows that 


1 1 1 


From Definition 5.4.5 and an argument such as that for Theorem 5.3.2, we have the 
following: 


Theorem 5.4.7. Fora function f, lim, f(x) = +00 if and only if 
lim f(x) = lim f(x) = +00. (5.21) 


What other kinds of theorems can we expect for limits of infinity? Can we prove 
something that resembles Theorem 5.4.4? The answer is “yes,” but the theorems will 
look different because the limits are not necessarily real numbers. Consequently, we'll 
have to begin with Definition 5.4.5 and do some of the work from scratch. The parts 
of Theorem 5.4.8 in what follows are ordered so that some of the subsequent ones 
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follow quickly from the earlier ones. This should make your work a little more efficient. 
Theorems 5.4.8 and 5.4.9 could just as easily be stated and proved in terms of one-sided 
limits. 


Theorem 5.4.8. Suppose f(x) > L, g(x) > +00, and h(x) > +00 asx —> a. Then 
asx —> a, 
1. —g(x) > —oo, 
. 1/g(x) > 0, 
. f(x) + g(x) = +00, 
. f(x)g(x) > +00 ifL > 0, 
. f@)g(x) > -c ifL < 0, 
. No conclusion can be drawn about f (x)g(x) if L = 0, 
- £)/g(x) > 0, 
- &(x) + h(x) > +00, 


oO OmanN DD oT FF YW LY 


. g(x)h(x) > +00, 


Oo 


. No conclusion can be drawn about g(x) — h(x) or g(x)/h(x). 


If f(x) > Oas x — a, then we can say something about 1/f(x«) under certain 
circumstances. 


Theorem 5.4.9. Suppose there exists a deleted 5 -neighborhood ofa such that f (x) > 0 for 
allx € DNs(a). Suppose also that f (x) > O asx — a. Then 1/f (x) > +00. Similarly, 
if f (x) < 0 for allx € DNs(a) and f (x) > 0, then 1/f (x) > —oo. 


In the same way that we use the notation x — at andx — a~ to mean x approaches 
a from the right and left, respectively, we can create a shorthand notation for functions 
that behave as in Theorem 5.4.9. We write f(x) — O* to mean that f(x) approaches 
zero and is positive on a deleted neighborhood of a, and we say f(x) approaches zero 
through positive values. Similarly, we write f(x) — O~ to mean that f(x) approaches 
zero and is negative on a deleted neighborhood of a, and we say f (x) approaches zero 
through negative values. Theorem 5.4.9 could be stated in terms of one-sided limits also. 


EXERCISES 


1. Use similar triangles and the equation of the circle from Fig. 5.4 to determine the 
xy-coordinates of the image of a € R under the bijection that sends R U {oo} to the 
circle. 


2. Prove Theorem 5.4.7: For a function f, lim,_, f(«) = +00 if and only if 


lim f(x) = lim, f(x) = +00. (5.22) 


5.5 Continuity 


3. Prove Theorem 5.4.8: Suppose f(x) > L, g(x) > +00,andh(x) > +ooasx > a. 

Then as x — a, 

(a) —g(x) > -«, 

(b) I/g(x) > 0, 

(c) f(x) + g(x) > +00, 

(d) f(x)g(x) > +oo if L > 0, 

(e) f(x)g(x) > —coif L <0, 

(f) No conclusion can be drawn about f (x)g(x) if L = 0, 
(g) f@)/g) > 0, 

(h) g(x) + h(x) > +00, 

(i) g(h(a) > +00, 

(j) No conclusion can be drawn about g(x) — h(x) or g(x)/h(x). 

4. Prove Theorem 5.4.9: Suppose there exists a deleted 5-neighborhood of a such that 
f(x) > 0 for all x € DN5(a). Suppose also that f(x) > Oas x — a. Then 1/f (x) 
— +oo. Similarly, if f(x) < 0 for all x € DNs(a) and f(x) > 0, then 1/f(x) > 
—oO. 


. Let f(x) = (x? —4)/(x — 1). Find with verification lim,_, ;- f(x) and lim,_, 1+ f(x). 


ol 


. In the spirit of Definitions 5.4.1 and 5.4.5, create definitions for the following terms 
involving limits: 


nN 


(a) limy++00 f(x) = +00; 
(b) limy 400 f(x) = —00; 
(c) lim, —co f(x) = +00; 


(d) limy.—oo f(x) = —O. 


5.9 Continuity 


The word continuous is possibly already a part of your vocabulary of functions. It might 
be that your calculus class delved into continuity enough to provide an €-6é definition. 
More than likely your notions of continuity are probably best summarized as a belief 
that a continuous function can be sketched without lifting your pencil off the paper. 
The graph is one clean, easily drawable piece. Even though such a view can be helpful 
in your understanding of some characteristics of continuity, it is far from true that 
all continuous functions are so easily drawable. The bizarre examples of undrawable 
continuous functions will come later in your study of mathematics. For now, we define 
the terms and study the basic results. We begin with continuity at a single point in the 
domain of f, then consider continuity on a subset of the domain. Then, in the same way 
that we talk about left-hand and right-hand limits, we'll talk about left continuity and 
right continuity. 
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5.9.1 Continuity at a point 

[| 

If lim,¢ f(x) exists, it describes the behavior of f near a but not at a. It’s possible that 
lim,_.g f(x) = L, while f(a) might either fail to exist or exist and be different from L. 
If lim,q f(x) exists and is the value of f(a), we give that phenomenon a name. 


Definition 5.5.1. Suppose f:S — Randa é€ Int(S). We say that f is continuous at 
a if 


lim f(x) = f(a). (5.23) 


If f is not continuous at a, we say f is discontinuous there, or that f has a discontinuity at 
a. 


Let’s reword Definition 5.5.1 in the language of €, 5, and neighborhoods. Continuity 
is a small step logically from limit, for all we do is delete the word deleted when we discuss 
neighborhoods of a. Definition 5.5.1 can be reworded to state that f is continuous at a 
if 


(We > 0)(45 > 0)(Vx € S)(|x —al <5 > | f(x) — f(@)| <e). (5.24) 


Compare the logical statement (5.24) to the symbolic form of the definition of limit in 
statement (5.5). The only significant difference between these logical statements in (5.24) 
is that |x — a| < 6 replaces 0 < |x —a| < 6 in (5.5). And since |x — a| < 6 is weaker 
than 0 < |x — a| < 6, continuity is therefore stronger than the existence of a limit. 

Because of the short step from limit to continuity, our work with limits in Section 5.2 
takes us a long way in the theory of continuous functions. If we go back to the theorems 
from Section 5.2, replacing limits L with f(a) and deleting the word deleted, we arrive 
immediately at the following theorems: 


Theorem 5.5.2. The constant function f (x) = c is continuous at everya € R. 


Theorem 5.5.3. The identity function i(x) = x is continuous at everya € R. 


Theorem 5.5.4. If f is continuous at a, then there exists a neighborhood of a where f is 
bounded. 


Theorem 5.5.5. If f and g are both continuous ata, then so are f + g, fg, and f — g. 


Theorem 5.5.6. Suppose g is continuous ata and g(a) # 0, then there exists a neighbor- 
hood of a where g can be bounded away from zero. In particular, if g(a) > 0, there exists 
M > Oand6>0 such that |x — a| <6 implies g(x) > M. If g(a) <0, there exists M <0 
and 5 > 0 such that |x — a| <6 implies g(x) < M. 


Theorem 5.5.7. If f and g are both continuous at a, and if g(a) # 0, then f/g is 
continuous ata. 


Corollary 5.5.8. A polynomial function is continuous at every a € R. Every rational 
function f (x) = P\(x)/P2(x), where P; and P are polynomials, is continuous at every 
a € R for which P,(a) £0. 


5.5 Continuity 


Theorem 5.5.9. A function f : S — R is continuous ata € S if and only if every sequence 
(an) such that a, — a satisfies f (an) > f(a). 


Notice that the hypothesis condition of Theorem 5.5.9 does not require a, 4 a as 
does Theorem 5.3.4. For if any of the a, = a, then f(a,) = f(a), so that the sequence 
(f (a)) is defined for all n. 

Here’s an important theorem we could not address with limits alone. You'll prove it 
in Exercise 1. 


Theorem 5.5.10. Suppose f is continuous ata and g is continuous at f(a). Then go f 
is continuous ata. 


To prove Theorem 5.5.10, an arbitrarily chosen e€-neighborhood of g[f(a)] 
produces a 6;-neighborhood of f(a), and this 5,;-neighborhood of f(a) produces a 
52-neighborhood of a. Another way to write what Theorem 5.5.10 states is 


lim gf@)1 = g [lim £0] = ¢[ dim 9)] = el f@). (5.25) 


With a little more mathematical machinery than we have up to now, Theorem 5.5.10 
comes in handy in certain manipulations of limits. If we assume for the moment that 
all the functions in the following are continuous at the points involved, Theorem 5.5.10 
would allow us to write something like 


lim,43/1 +e = /lim( +e) 


= 1 —x? 

=a ac ae (5.26) 
= V1 4 elim) 

=vVlt+e’. 


The reason we could not have a theorem in Section 5.2 that said something like “If 
f(x) > L, asx > aand g(x) > Ly asx > Ly, then (go f)(x) > Ly asx > a” 
is that a possible hole in the domain of g at f(a) might cause troublesome gaps in the 
domain of g o f around a. For example, let f(x) = 0 and g(x) = sinx/x. Then go f 
is not defined at any x € R. 

If f is discontinuous at a it could be for one or more of three basic reasons. If 
Eq. (5.23) is not satisfied it might be that: 


(D1) lim,_., f(x) does not exist; 
(D2) f(a) does not exist; or 
(D3) lim,.a f(x) and f(a) both exist, but are not equal. 


Figure 5.5 illustrates some of these possibilities ata = 1, 2,3, 4,5. Ata = 1, f behaves 
much like sin(1/x) near x = 0. (See Exercise 11 from Section 5.2.) In Exercise 2, you'll 
be asked to state which of the possibilities DI-D3 explains the discontinuities of f. The 
discontinuities at 2 and 4 are called removable discontinuities because it is possible to 
define or redefine the value of f there to make it continuous there. The discontinuity at 
5 is called a jump discontinuity. 
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I 2 3 3 6 


Figure 5.5 Some examples of discontinuities. 


Example 5.5.11. The function f(x) = sinx/x has a removable discontinuity at zero. 
Thus the function 


sin x 


—, ifx 40; 
1, if x = 0; 


g(x) = (5.27) 


is continuous at a = 0 because lim,_,9 f(x) = 1 = f(0). 


5.9.2 Continuity on a set 


The definition of continuity of f : S > Rona subset A C S is fairly straightforward if 
A is open, for then we are guaranteed that every a € A is contained in a neighborhood 
entirely within the domain of f. 


Definition5.5.12. If f : S — Risafunctionand A C Sis open, wesay f is continuous 
on A if it is continuous at everya € A. Logically, we may write this as 


(Wa € A)(Ve > 0)(35 > 0)(Wx € A)(Ix —a] <6 > | f(x) — f@| <6). (5.28) 


Notice the only difference between the logical statement (5.24) and (5.28) is that the 
latter begins with (Va € A). To construct a proof that f is continuous on a set A by going 
back to € and 6, you'll begin by picking botha € Aande > 0. We want to do precisely this 
in an extended example now. We want to show that f(x) = 1/x is continuous on (0, 1). 
Yes, it can be said that continuity of f(x) = 1/x on (0, 1) follows from Theorem 5.5.8. 
However, we go back to an €-6 proof as a valuable learning experience. There are plenty 
of functions whose continuity we have not addressed. As you work your way through the 
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details of this demonstration, youre going to find it to be full of algebraic manipulations 
and observations, none of which is particularly difficult to understand. However, youre 
likely going to be wondering how anyone would ever know these are the right steps to take. 
Be patient—these things come with experience. However, do make sure you understand 
all the steps along the way. We'll present the details in the same order in which you would 
discover them if you were starting from scratch and had the know-how to stumble in the 
right direction. We’re trying to illustrate three points. First, sometimes you have to stand 
on your head, especially in €-6 proofs. Second, even though you have to choose € > 0 
arbitrarily, there are times when you can make convenient assumptions about € that lose 
no generality and make your work easier. Third, and most importantly, finding a suitable 
6 > 0 fora given a € A ande > 0 will depend on the values of both a and €. Digging 
through this work will prepare you for Exercise 3, where you'll show that f(x) = ./x is 
continuous on (0, 00). 

To begin the construction of a proof that f(x) = 1/x is continuous on (0, 1), pick 
a € (0, 1) ande > 0. As usual in an €-6 proof, we must find 6 > 0 so that the inequality 


|x —a| <6 (5.29) 


will be at least as strong an inequality as 


<€. (5.30) 


Thus we work backwards from Eq. (5.30), trying to transform it into something of the 
form of inequality (5.29), making sure that the steps we take do not produce inequalities 
that become any weaker. Unfolding Eq. (5.30) and trying to convert it to something of 
the form a — 6 < x < a+, we have 


1 1 
-—€<—-—-—<e 
Xx a 


1 1 1 
—--€<-—<—+ée. (5.31) 
a x oa 

At this point, we’re tempted simply to take the reciprocal of everything in Eq. (5.31) 
to have an x in the middle, but we have to be careful about the signs. In Exercise 11 of 
Section 2.3, you showed the following: 


Vv 


Ble Ble BI 


0O<a<Bp> (5.32) 


a<0<p-> 


A 


a<B<07> 


Rie l= kl 
Vv 


Looking at all the terms in Eq. (5.31), we’re sure that 1/a + € > 0. Also, the unknown x 
will be positive because a > 0, and we'll make sure that our 5-neighborhood of a is 
small enough to include only positive numbers. The problem is 1/a — €, which could 
be positive or negative, depending on the size of €. In fact, 1/a — € > 0 if and only 
ife < 1/a. This might seem troublesome at first, but in actuality we can ignore the 
problem and assume 1/a — € > 0. Here’s why. 
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If someone presents us with an € > 0 to serve as the radius of a target on the y-axis 
that we must hit from the x-axis, then we are certainly free to aim for a target of smaller 
radius. If we make the statement “Pick € > 0,” and discover that values of € in excess of 
1/a make things inconvenient, then we could freely replace € with some €, that satisfies 
€; = min{e, | /a}. Proceeding from there to find 5, we would be safe because 


1 1 


X a a 


<€, (5.33) 


<6) 


Thus, findingaé > Othat guarantees |1/x — 1/a| < €; willalso guarantee that Eq. (5.30) 
is true. This all boils down to the fact that we’re free to write: “We may assume without 
loss of generality that € < 1/a.” The rest of the work is then easier. 

Continuing from Eq. (5.31) and applying Eq. (5.32), we have 


a 


<X< 


: (5.34) 
l+ea l—-—€a 


If we're trying to transform this into something of the form a — 5 < x < a+, we 
still have some work to do because Eq. (5.34) does not suggest a value of 6 in its present 
form. This is where we do some sneaky algebra involving Eq. (2.67) from Section 2.4, 
Exercise 10. This is just one of many places where Eq. (2.67) demonstrates its usefulness. 
If we rearrange Eq. (2.67) and write it two ways, once letting x = y and once letting 
x = —y, we have 


t= yo 
a ae ee eat ar eer (5.35) 
and 
Les 
l-y+y?—--+(-yy"= : 5.36 
ed (-y) eee (5.36) 
If we know that 0 < y < 1, then letting m = 1 in Eq. (5.35) yields 
1-y? 1 
fp a (5.37) 
Ley ey 
Letting n = 2 in Eq. (5.36), we have 
I+y? 1 
1-yd-y)=1-y+y=— (5.38) 


>=. 
l+y l+y 
Since 0 < € < 1/a, we have 0 < ea < 1, so that Eqs. (5.37) and (5.38) both hold for 
y = ea. Thus 


1 
eee ea(1 — €a) and Dee 


(5.39) 


Now we're trying to guarantee that Eq. (5.34) is true, for then Eq. (5.30) will be true. 

If we take both inequalities from Eq. (5.39) and multiply them through by a > 0, we 
have the following: 

a 


<a-—ea*(l—ea) and a+ea’*< ‘ 
l+ea l-—ea 


(5.40) 
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Thus, if 
Ged Hea) es Sateen: (5.41) 


then Eq. (5.34) will also be true. We’re almost there. Inequality (5.41) does not quite 
look like something of the form a — 6 < x < a+. It looks more like something 
of the form a — 5; < x < a+ 5). However, since a > 0,€ > 0, and ea < 1, then 
0 < 1 —ea < 1. Thus, €a?(1 — €a) is a smaller positive number than €a? so that the 
left-hand side of inequality (5.41) is closer to x than the right-hand side. Thus, we should 
let 5 = €a*(1 — €a), and |x —a| < 6 will imply Eq. (5.41), which will imply Eq. (5.34), 
which will imply Eq. (5.30). This is the heart of all the scratchwork, and we can then 
present all this in the form of a cleaned-up proof. Because of all the effort we exerted in 
working through this, let’s call it a theorem. 


Theorem 5.5.13. The function defined by f(x) = 1/x is continuous on (0, 1). 


Proof: Pick a € (0,1) and e > 0. Without loss of generality, we may assume 
€ < 1/a.Leté = €a*(1 —€a), which is clearly positive. Then if |x — a| < 6, we have 


aa =é0) oF 24 ae Hater 1 =a); (5.42) 
Because 
ee <a(l—ea+e€’a*) =a —€a*(1—€a) (5.43) 
l+ea 
and 
a+ea?(1—€a) <a+ea* =a(1+ea) < i : (5.44) 
—€a 
we have 
a 
<x< d (5.45) 
l+ea l—€a 
Since every term in Eq. (5.45) is positive, we may reciprocate to have 
1 1 1 Il-e 1 
+ée= ag >-> — €, (5.46) 
a a x a a 
or 
If(x) — f(@| <e. (5.47) 
Thus, f is continuous at a. | 


Notice a very important fact. The value of 5 we proposed does depend on both ¢€ 
and a. In fact, the closer a is to zero, the smaller must be our value of 5. On the basis 
of our work, it appears there is no single value of 5 that would depend only on € and 
work for all a € (0, 1). It might occur to you, however, that someone else might have 
taken a different approach and stumbled across a value of 6 that was not tied to the value 
of a but depended only on €. The question is an important one and will be answered 
when we study uniform continuity in Section 5.7. Suffice it to say for now that no such 
6 as a function of € alone exists for f(x) = 1/x on (0, 1). Intuitively, it might seem 
plausible when we consider the vertical asymptote 1/x has as x — 0+. Look at Fig. 5.6 
and imagine setting an €-tolerance around a point 1/a on the y-axis. If a is very close 
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lla+€ p 


l/a 


l/a-€ 


a 


Figure5.6 f(x) = I/x. 


to zero, the graph of f is very steep around (a, 1/a) and varies considerably even in a 
small neighborhood of a. Granted, for any a there is a -neighborhood of a that maps 
into N.[f (a)], but if you imagine smaller and smaller values of a, then the asymptote of 
the graph necessitates smaller and smaller values of 5. No single 6-value will work for all 
a € (0, 1) because of this asymptote. 

The scratchwork you'll go through in Exercise 3 to show ,/x is continuous on (0, 00) 
is not nearly as involved as it has been for our example here, but you will learn a bit more 
about using the inequality f(a)—e < f(x) < f(a)+é€ and working backwards to find an 
inequality of the forma —6 < x < a+6 thatisat least as strong. A peculiar thing for ./x 
on (0, oo), however, is that it actually is possible to find a 6 that is independent of the value 
of a. Although it is true that the graph of ./x becomes very steep as x — Ot, it does not 
behave asymptotically there. Why such a 6 can be found will become clear in Section 5.7. 


5.5.3 One-sided continuity 

EEE 

Even if lim,_., f(x) does not exist, there might still be a hope that either lim,_,,- f(x) 
or lim,_,a+ f(x) exists. Similarly, although f might not be continuous at a, we might 
have continuity from one side or the other, as it were. 


Definition 5.5.14. A function f is said to be left continuous at a if 


lim f(x) = f(@). (5.48) 
xa 
Similarly, f is said to be right continuous at a if 


lim, f@=f@. (5.49) 


5.6 Implications of Continuity 


An immediate consequence of Theorem 5.3.2 is the following. 
Theorem 5.5.15. A function f is continuous at a if and only if it is both left and right 
continuous ata. 


In defining continuity on A, Definition 5.5.12 stipulated that A must be open. This 
allowed for a definition of continuity on (a, b), but not on [a, b]. One-sided continuity 
now allows us to define continuity on [a, b] in sucha way that we do not concern ourselves 
with how f behaves or whether it even exists for x < a orx > b. 


Definition 5.5.16. A function f :.S — Ris said to be continuous on [a, b] C S if the 
following hold: 


1. f is continuous on (a, b); 
2. f is right continuous at a; and 


3. f is left continuous at b. 


Definition 5.5.16 can be naturally adapted to apply to the following example by 
omitting stipulation 3. 


Example 5.5.17. f(x) = ./x is continuous on [0, +00). From Exercise 3, f is contin- 
uous at all a € (0, 00). From Exercise 2 in Section 5.3, lim,9+ f(x) = f (0). 


EXERCISES 


1. Prove Theorem 5.5.10: Suppose f is continuous at a and g is continuous at f(a). 
Then go f is continuous at a. 


2. For f sketched in Fig. 5.5, state which of the characteristics D1—D3 describes the 
discontinuity of f atx = 1,2, 3,4, 5. 


3. Use an €-6 proof to show that f(x) = ./x is continuous at anya > 0. 


5.6 Implications of Continuity 


Having defined and illustrated continuity in Section 5.5, let’s look at some of its 
implications. 


5.6.1 The intermediate value theorem 

[a 

The imagery of a continuous function being drawable without picking up the pencil 
makes this first theorem seem plausible. It says, in short, that a continuous function 
cannot be negative at one point and positive somewhere else without crossing the x-axis 
somewhere in between the two points. 
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Theorem 5.6.1. Supposea < b, f(a) < 0 < f(b), and f is continuous on [a, b]. Then 
there exists c € (a, b) such that f (c) = 0. 


The proof of Theorem 5.6.1 (which you'll supply in Exercise 1) is yet another nice 
application of the LUB property of R. There will be a hint if you need it. Suffice it to say 
for now that if you imagine yourself standing on the x-axis at x = a and looking up the 
x-axis, there’s a natural subset of R that is both nonempty and bounded from above by b 
to which the LUB property can apply. The LUB of this set is the number you need. If g is 
a continuous function such that g(a) > 0 > g(b) for some a < b, then Theorem 5.6.1 
can be applied to —g to produce some c € (a, b) such that g(c) = 0. 

Theorem 5.6.1 provides most of the machinery needed to prove a theorem whose 
name you might remember from calculus—the Intermediate Value Theorem. We'll sup- 
ply the proof here to illustrate a rather common convenience in mathematics. If you can 
derive some result about a real-valued function g involving zero, then you might be able 
to derive a similar, more general result about other functions by altering them to fit the 
hypothesis conditions of the theorem applied to g. 


Theorem 5.6.2 (Intermediate Value Theorem [IVT]). Suppose f is continuous on 
[a, b], and suppose f (a) < f(b). Let yo € R be any number satisfying f (a) < yo < f(b). 
Then there exists c € (a, b) such that f (c) = yo. 


In the IVT, f does not necessarily change from negative to positive as x runs from a 
to b. However, if we create a new function g by dropping or raising f to make g(a) < 
0 < g(b), then we can apply Theorem 5.6.1 to g and see what it says about f. Here is 
the proof. 


Proof: Define g(x) = f(x) — yo. Since f and the constant function yo are both 
continuous on [a, b], Theorem 5.5.5 and our work with one-sided continuity imply g 
iscontinuous on [a, b]. Furthermore, g(a) = f(a)—yo < Oandg(b) = f(b)—yo > 
0. By Theorem 5.6.1, there exists c € (a, b) such that g(c) = 0. Thus f(c) = yo. 


Theorem 5.1.4 with its Corollary 5.1.5 demonstrated that strict monotonicity of a 
function on a set implies one-to-oneness (hence invertibility) there. The converse of 
Corollary 5.1.5 is clearly not true, as is illustrated by f(x) = 1/x on R\{0}, which is 
one-to-one but not monotone. However, if f is one-to-one and continuous on A, then 
it must be strictly monotone. You will prove the following theorem in Exercise 4. The 
IVT will come in handy. 


Theorem 5.6.3. If f:S — R is continuous and one-to-one on [a,b] C S, then f is 
strictly monotone on A. 


The IVT can also help us prove that a continuous, invertible function has a continu- 
ous inverse. Specifically, if f is continuous and invertible on [a, b], then it is continuous 
and one-to-one. By Theorem 5.6.3, f is strictly monotone on [a, b] also. By picking 
dé f((a, b]), writing d= f(c), and choosing « > 0, you should be able to find a 
6-neighborhood of d such that d — 6<x <d + 6 implies c —€ < f-'(x)<c +e. 
(Draw a picture!) You'll do precisely this in Exercise 5. Here’s the statement of the 
theorem. 


5.6 Implications of Continuity 


Theorem 5.6.4. Suppose f:S — R is continuous and invertible on [a,b] © S. Then 
fut : f (fa, b]) > [a, b] is continuous on f ([a, b)). 


5.6.2 Continuity and open sets 
| —— =} 
The logical equivalence of the €-5 form of continuity and the sequential limit form con- 
veyed by Theorem 5.5.9 is useful not only because it gives us freedom to exchange one 
property for another, but also because it suggests an alternative way to define continuity 
that might be preferred in the creation of some mathematical structures. Theorem 5.6.5 
provides another statement that is logically equivalent to continuity, this time in terms of 
the pre-images of open subsets of R. This result does not immediately have the intuitive 
appeal of the €-6 definition in terms of the graph being drawable without picking up the 
pencil. This is arguably a good thing, for that feature of drawability is not really valid. 
In the theorems that follow, we’re going to assume the functions involved are defined 
on all of R. This will keep the proofs relatively simple and get the point across, though 
similar theorems can be addressed on restricted domains. 


Theorem 5.6.5. A function f : R > R is continuous if and only if the pre-image of every 
open set is open. 


When you prove Theorem 5.6.5 in Exercise 6, you can write a very elegant proof if 
you'll use a slightly different form of the definition of continuity than that in Eq. (5.24). 
The statement |x — a| < 6 > |f(x) — f(@| < € is equivalent tox € Ns(a) > f(x) 
N.Lf (a)], or if you prefer 


FINs(a)] © NeLf(@)). (5.50) 


If you will use this form of the definition of continuity along with the standard definition 
of open set, you'll find that Exercises 6 and 7 from Section 3.2 make for some interesting 
manipulations of the sets involved in the proof. From Theorem 5.6.5, the following should 
drop right into your lap (Exercise 7). 


Corollary 5.6.6. A function f : R > R is continuous if and only if the pre-image of every 
closed set is closed. 


Let’s digress just for a moment to comment on the significance of Theorem 5.6.5. 
First, the logical equivalence of the two statements in Theorem 5.6.5 suggests yet another 
place where one might begin in defining continuity of a function. If one were to begin 
by defining a function to be continuous provided the pre-image of every open set is 
open, then our €-6 definition and the sequential limit property of Theorem 5.3.4 would 
become theorems in the analysis of real numbers. If it seems a little unnatural to begin 
with such an open set pre-image definition, consider the following, which is a glimpse 
into topology. 

At the beginning of Chapter 4, we said that a defining characteristic of analysis is 
that elements of a set have either a measure of size (norm) or of distance between them 
(metric). In R, the measure most commonly used is absolute value, so that |x| is the size 
of x € Rand |a — b| is the distance between. If S has a metric, then a neighborhood 
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of a is defined as all points within a certain distance of a. Then a definition of open 
set as Definition 4.2.2 becomes meaningful because openness is defined in terms of 
neighborhoods. 

In topology there is no such notion of size or distance either defined or assumed on 
S. Instead, we begin with the idea of open set in what might seem a peculiar way. Given 
S, we declare some subsets to be open just by definition. If you want to declare every 
subset of S open, fine; however, that might not prove to be especially interesting. More 
than likely yowll want some subsets of S$ to be open and some not to be open. What 
prevents your freedom to define open sets however you want from degenerating into 
mathematically useless anarchy is that your collection of open sets needs to have some of 
the same properties of open sets we derived as theorems for R. Specifically, if we lump all 
the subsets of S$ that we declare to be open into the family O, then in forming a topology 
we insist that: 


(T1) both @ and S are members of O; 


(T2) if F C O isa family of open sets, then J ,-4 A is open. That is, O is closed under 
union; and 


(T3) if {Ag : 1 < k < n} isa finite collection of open sets, then (\;_, Ax is open. That 
is, O is closed under finite intersection. 


Closed sets are then defined to be those whose complement is open, and you're on your 
way to building a structure that will have some parallels to those of R we’ve studied, but 
will be more abstract and austere because the definition of open set does not probe as 
deeply into some assumed structure of S. 

Theorem 5.6.5 will help you prove the following in Exercise 8. 


Theorem 5.6.7. The continuous image of a compact set is compact. That is, if f : R > R 
is continuous on a compact S C R, then f (S) is compact. 


In Exercise 2 from Section 4.2, you showed that if L is the LUB ofa set Sand L ¢ S, 
then L is a limit point of S. Since a compact set is bounded, it has an LUB, and since it’s 
also closed, it contains all its limit points. Thus a compact set contains its LUB. Similarly, 
a compact set also contains its GLB. If f : R — Ris continuous and $ C R is compact, 
then f(S) is compact by Theorem 5.6.7. Consequently, f(S) contains its LUB and GLB. 
Write M = max[f(S)] and m = min[f(S)]. Then there exist x1,x. € S such that 
f (x1) = mand f (x2) = M. We have just proved the following theorem, which you 
might remember from your calculus class. 


Theorem 5.6.8 (Extreme Value Theorem [EVT]). If f : R — R is continuous on a 
compact set S, then f attains a maximum and minimum value on S. 


EXERCISES 


1. Prove Theorem 5.6.1: Ifa < b, f(a) < 0 < f(b), and f is continuous on [a, b], 
then there exists c € (a, b) such that f(c) = 0.2 


*Let A = {x € [a,b] : f(x) < 0}. What can you say about A? 


5.6 Implications of Continuity 


. Prove the fixed point theorem: If f : [a,b] — [a, b] is continuous, then there exists 
c € [a, b] such that f(c) = c3 


. The fundamental theorem of algebra states that every polynomial P : R — R whose 
degree is odd has a root in R. That is, for every polynomial P(x) = don41x2"*! + 
donx2" ++++-+ a,x +o, there exists some c € R such that P(c) = 0. In this exercise 
you prove the fundamental theorem of algebra, first for the case a2,41; > 0, then for 
the case dzn41 < 0. 


(a) Use your definitions from Exercise 6 in Section 5.4 to prove 


lim x = +00 and lim x = —-o. (5.51) 
X—+> +00 x—>—00 


(b) With part (a) in hand, a theorem analogous to Theorem 5.4.8 would be demon- 
strable for x — -too by paralleling its proof. Assuming this result, show that if 
A2n+1 > 0, then P(x) satisfies 


lim P(x) = —oo and lim P(x)=+o. (5.52) 
x>—-0O xX—>+00 


(c) Use your result from part (b) to prove the fundamental theorem of algebra for 
the case d2)41; > 0. 

(d) Prove the fundamental theorem of algebra for the case d2,+; < 0 by applying 
part (c) to —P(x). 

. Prove Theorem 5.6.3: Suppose f : S > R is continuous and one-to-one on [a, b] C 

S. Then f is strictly monotone on [a, b].4> 


. Prove Theorem 5.6.4: Suppose f : S — Ris continuous and invertible on [a, b] C S. 
Then fot : f ([a, b]) > [a, b] is continuous on f ({a, b]). 


. Prove Theorem 5.6.5: A function f : R — Ris continuous ifand only if the pre-image 
of every open set is open. 


. Prove Corollary 5.6.6: A function f : R — Riscontinuous ifand only if the pre-image 
of every closed set is closed.® 


. Prove Theorem 5.6.7: The continuous image of a compact set is compact. That is, if 
f :R— Ris continuous on a compact S C R, then f (S$) is compact. 


. Give two examples to illustrate that compactness is necessary for the EVT to apply 
to a continuous function, one example where S is bounded but not closed, and one 
where S is closed but not bounded. 


3Consider the function g(x) = f(x) — x. If either g(a) or g(b) is zero, youre done. Otherwise, apply 
the IVT. 

4See Exercise 3h from Section 1.2. 

Follow your nose. For f not monotone and f(a) < f(b), spend some time showing that there exist 
c1 < cy <3 such that f(c1) < f(co) > f(cs) or f(e1) > f(c2) < fc). 


See Exercise 8 from Section 3.2. 
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—— 5.7 Uniform Continuity 


We've mentioned earlier in this text the distinction between every question having an 
answer and the existence of a single answer that applies to every question. In Defini- 
tion 5.5.12 we wrote the definition of continuity on a set A logically as 


(Va € A)(Ve > 0)(5 > 0)(Vx € A)(lx — al <6 > | f(x) — f(@| <e). (5.53) 


In general, the value of 6 will depend on both € and a. In this section, we want to define a 
form of continuity where 6 depends only on €. We'll look at some examples to illustrate 
the point, and then study two important theorems. 


5.7.1 Definition and examples 


Let’s take the definition of continuity in Eq. (5.53) and leapfrog the first component piece 
(Va € A) two jumps to the right: 


(Ve > 0)(A5 > 0)(VWa € A)(Vx € A)(|x — al <6 > |f(x) — f@|<e). (5.54) 


Taking (Va € A) and moving it to the right of (Ve > 0) does not change anything logically 
from statement (5.53). However, moving (Va € A) to the right of (46 > 0) makes a big 
difference. If you were writing a proof of continuity from (5.53), you would begin by 
picking a € Aande > 0. Then, with both a and € in hand, you would go hunting for 
6 > Owith certain required properties. However, with the phrase (Va € A) repositioned 
as in (5.54), things are different. Ifyou were writing a proof of a theorem where (5.54) was 
involved, you would begin by picking only € > 0. Then, without knowing any particular 
value ofa € A, you would have to findé > 0 from € alone, and this 6 would have to serve 
for alla € A. Thus, (5.54) suggests that the value of 6 can be found after having specified 
only €, and this 5-value will work for all a, x € A satisfying |x — a| < 6. Because (5.53) 
specifies a € A first, we think of a as being the center of a 5-neighborhood and x being 
an arbitrary point in that neighborhood. In (5.55), there is really no reason to call one 
point a and the other one x, as if one were fixed before the other. The point is that there 
exists 6 > O such that if any two points are within 6 of each other, then their functional 
values are within € of each other. For clarity and convenience, we change the symbols 
slightly in the following definition. 


Definition 5.7.1. A function f: S — Ris said to be uniformly continuous on A C S$ 
if for all € > O there exists 6 > O such that, for allx, ye A, |x — y| <6 implies | f(x) — 
I O)| < €. Logically, we may write this as 


(Ve > 0)(485 > O)(Vx, y € A)(Ix —y| <6 > |f(x) -—fO)| <6). (5.55) 


To show that a function f is uniformly continuous on A, we work backwards from 
| f(x) — f(y)| < €. Watch what happens in the following example. 


Example 5.7.2. Show that f(x) = x7/(x + 1) is uniformly continuous on [0, 00). 


5.7 Uniform Continuity 


Solution: Before we write a demonstration, we need to do some scratchwork. If you 
play with the expression | f(x) — f(y)| < € and try to factor |x — y| out of it, you 
can arrive at the following: 


2 y? 


xt+l y+l 


xXy+tx+y 
@+Dyt+)D]- 


IfG) — FO) = | | =\|x—yl (5.56) 


Next, let’s split up the fraction in the right-hand side of Eq. (5.56) and apply the 
triangle inequality. Also, notice thatx, y > Oimplies 1/(x+1),1/@4+),x/@+1), 
and y/(y + 1) are all less than 1. So we have 


xy+x+y 


= lara 
@+DOFD|~ 


(«+ Dyt+)) 


oan learner 
@+ DY +1) @+ Dy +1) 


(5.57) 


llerol* berallera| 
@+D||O+d| *|e+0| OF 


aller 
| | ea, 
O+D]|@+D| 7 


Having arrived at | f(x) — f(y)| < 3 |x — yl, we’re ready to write a demonstration. 
Choose € > 0, and let 5 = €/3. Then if x, y > 0 and |x — y| < 5, we have 


x2 2 xytut 
ye) fori =| Aa =n sl eer 
< |x vi( | i 
X~+))}Q+)) (5.58) 
x 1 y 1 
" resoitces ba essitceanl 


<3|x-—y| < 35 =e. 
|| 


The scratchwork of Example 5.7.2 suggests a theorem that you'll prove in Exercise 1. 


Theorem 5.7.3. If there existsm > 0 such that | f(x) — f(y)| < m|x — y| forallx, y € 
A, then f is uniformly continuous on A. 


If y # x, the hypothesis condition of Theorem 5.7.3 is equivalent to 


f@)— fQ) 
x-y 


<m, (5.59) 


which means that f has a bound on the slopes of lines through any two points on its 
graph. Loosely speaking, if the steepness of f (as measured by slopes of secant lines) is 
bounded, then f is uniformly continuous. The converse of Theorem 5.7.3 is not true. 
Shortly, we'll point out a function f that is uniformly continuous on a set A, but for 
which inequality (5.59) does not hold across A for any m > 0. 

Naturally, we want to include a demonstration that a continuous function need 
not be uniformly continuous on a set. What does it mean for f not to be uniformly 
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continuous on A? 
(de > 0)(VS > 0)(Ax, y € A)(|x — y| < band | f(x) — f(y)| => 6). (5.60) 


That is, there exists some € > 0 so that, no matter how small 5 > 0 might be, there will 
be two points somewhere in the set A that are within 6 distance of each other, but whose 
functional values are at least € apart. 


Example 5.7.4. Show f(x) = 1/x is not uniformly continuous on (0, 1). 


Solution: First some scratchwork. It’s the vertical asymptote at x = 0 that provides 
us with x and y values that are very close together and whose functional values can 
differ as much as a strategic € we'll choose. We have to play with the inequality 


(5.61) 


1 1 | ly—x| 
x yl Ixllyh 7” 
and find some € > 0 so that, regardless of 5 > 0, we can find two points x and y that 
are within 6 of each other and satisfy inequality (5.61). No obvious €-value jumps 
out at us, so we'll try letting € = 1 and see if we can proceed. 

Inequality (5.61) itself suggests a way to find x and y. Whatever we decide to let 
x be, we can let y = x + 6/2 so that |y — x| = 6/2 < 6. Then the trick is to let x 
be sufficiently close to zero so that |x| |y| is small enough to make inequality (5.61) 
true. Furthermore, since smaller values of 5 represent our primary obstacle, we may 
assume 6 is smaller than any convenient positive number, if such an assumption 
presents itself as helpful. If we let x = 5/2 and y = 4, inequality (5.61) falls right 
into place as long as 6 < 1. Here is our demonstration. 

Let € = 1, and pick any 6 > 0. We may assume WLOG that 6 < 1. Let x = 5/2 
and y = 6. Then |x — y| < 6, and 


Kose at (5.62) 

x)-fQ)| = = =—>l=e. : 
Ixllyl 67/2 8 

Thus, f is not uniformly continuous on (0, 1). | 


2 


In Exercise 2, you'll show that f(x) = x~ is not uniformly continuous on [0, +00). 


5.7.2 Uniform continuity and compact sets 


Perhaps the most beloved theorem dealing with uniform continuity is the following. 


Theorem 5.7.5. If f :S — R is continuous and S is compact, then f is uniformly contin- 
uous. 


There is something about the fact that every open cover of S is reducible to a finite 
subcover that allows us to liberate the value of 5 from any specific points in the domain 
and determine it from € alone. We'll supply the proof here because it requires a few 
sneaky shrinkings of neighborhoods. If you want to try to prove it on your own, here’s a 
thumbnail sketch of how to proceed. 


5.7 Uniform Continuity 


As usual, we pick € > 0. Then since f is continuous at every point in S, we can 
cover S with a slew of neighborhoods, one centered at each a € S, whose radius is half 
the 5-value that guarantees f[Ns(a)] © Ne/2l f(a)]. Since every a € S is in its own 
5-neighborhood, the set of all these neighborhoods covers S$. Compactness of S' then 
allows us to reduce this cover to a finite subcover. These finitely many neighborhoods 
supply us with a single 6-value—the minimum of the finitely many 5/2-values from the 
subcover. We can then show that if |x — y| < 6, then| f(x) — f(y)| < €. See if you can 
fill in the details. If not, here’s the whole proof. 


Proof: Picke > 0. Because f is continuous at every a € S, then for any particular 
a € S, there exists 6(a) (the notation illustrating that 6 is a function of a) such that 
|x — a| < 5(a) implies | f(x) — f(a)| < €/2. Cover S with the set C = {N5ay/2(a) : 
a € S}. Since every a € S is in its own neighborhood of radius 5 (a) /2, C does in fact 
cover S. Since S is compact, C has a finite subcover C; = {N5(a,)/2(ae) 1 1 < k <n}. 
Let 6 = min{é6(az,)/2 : 1 < k <n}, and pick x, y € S such that |x — y| < 6. Now 
since C, covers S, there exists some k such that x € N5(a,)/2. Furthermore, y € N5(a,) 


because 
Ip — axl < ly —al tb ay) <8 +8 < OY 4 YX 50a), (5.63) 
Thus 
If@) = FO SIF) - f@dl + 1f@d = FOI < 545 =6 (6.64) 
so that f is uniformly continuous on S. a 


In Exercise 3 from Section 5.5, you showed that f(x) = ./x is continuous on 
(0, +00). Since lim,_,9+ f/x =0= /0, f is continuous on [0, 1], hence it is uniformly 
continuous there. However, the graph of f becomes very steep as x > 0*, so that (5.59) 
is not satisfied. In Exercise 3, you'll demonstrate that this is true. 

In Example 5.7.4, the fact that (0, 1) is not closed allows for 1/x to have a vertical 
asymptote at one endpoint. Even though 1/x is continuous throughout (0, 1), the asymp- 
tote is where we look to disprove uniform continuity. When you show that f(x) = x? is 
not uniformly continuous on [0, +00) in Exercise 2, it is by looking among sufficiently 
large numbers that you'll find x and y within 6 of each other whose functional values 
differ by at least €. 


EXERCISES 


1. Prove Theorem 5.7.3: If there exists m > 0 such that | f(x) — f(y)| < m|x — y| 
for all x, y € A, then f is uniformly continuous on A. 


2. Show f(x) = x? is not uniformly continuous on [0, +00). 


3. Show that inequality (5.59) is not satisfied by f(x) = ./x on [0, 1] for anym > 0. 
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Groups 


In its simplest terms, algebra can be thought of as the study of sets on which binary 
operations provide the defining internal structure for the set. For example, we may 
construct a set and define a form of addition or multiplication, then look at the structure 
of S and the relationships between its elements that result from these operations. 

One particularly interesting feature of algebra is the study of mappings between sets 
S; and S; where the structure of the binary operation on S| is preserved among the images 
of the elements in Sy. More concretely, if S$; has binary operation «, if Sz has binary oper- 
ation -, and if f : S$; — Sp, we address the question of whether f(a*b) = f(a)- f(b) 
for alla, b € S,. Such is a glimpse into what constitutes algebra. The internal structure 
of a set is addressed in the way that those elements combine to produce other elements 
of the set. 

In this chapter, we begin our study of some of the most basic concepts of algebraic 
structures by starting with groups. Some of the theorems are actually restatements of re- 
sults we have already seen for R. Now, however, the context is broader and more abstract 
hence the air might seem a little thinner. Instead of proving algebraic theorems where 
we have the real numbers specifically in mind, we prove theorems based on assumptions 
that certainly apply to real numbers, but of which the real numbers are only a specific 
example. 


6.1 Introduction to Groups 


CHAPTER 


6 


In this section, we discuss some basic characteristics of the simplest algebraic structures. 
The structure called a group is a good place to begin. 
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6.1.1 Basic characteristics of algebraic structures 

[| 

An algebraic structure begins with a nonempty set S and builds its internal structure in 
several stages. First, there must be some notion of equality either defined or understood 
on S, which naturally must be an equivalence relation. In Chapter 0, we laid out assump- 
tions of IR, one of which was that equality in R satisfies properties E1-E3. We didn’t 
really probe into what’s behind equality in R. However, our work in Section 2.6 was a 
very important example of how equality sometimes arises in a set. Assuming without 
question that equality in Z is an equivalence relation, we built the rationals as the set of 
pairs of integers p/q, withg 4 0. Then as soon as the elements of Q were constructed, we 
defined a form of equality in Q in terms of equality in Z, and showed that this definition 
is an equivalence relation on Q. 

Once the set S is constructed, we define one or more binary operations. If we denote 
an arbitrary binary operation by *, we want the definition of * to have two features. 
First, * should be well defined, as addition and multiplication are assumed to be on R. 
That is, given a, b,c,d € S,wherea = bandc = d (equality here being the equivalence 
relation defined on S), we want a * b = c * d. Second, * should be closed. Just as + 
and x are closed on R, we want assurance that combining two elements a, b € S by the 
operation * will produce a « b € S as well. 


Example 6.1.1. Let A be a nonempty set, and let S be the set of all one-to-one, onto 
functions from A to itself. Then the binary operation of composition is closed on A by 
Exercise 2 from Section 3.2 (Questions Q2 and Q4). 


Example 6.1.2. Consider the set of irrational numbers with the binary operation x. 
Since V2 is irrational, but /2 x /2 = 2 € Q, then x is not closed as an operation on 
the irrationals. 


If you covered Section 2.9, you can see that a binary operation on S can be defined as 
afunction f : Sx S — S. The formal notation that would be used to represent addition 
as such a function would be + : S x S > S, writing +(a, b) to mean a + b. The fact 
that + as a function has property F1 is precisely what it means for the binary operation + 
to be closed as we’ve used the term throughout this text. That + has property F2 means + 
is well defined as a binary operation, as we’ve used the term. 

All the binary operations we’ll address will have the associative property, as do + and 
x on R. However, we might have to verify associativity if the context is a new one and 
we've created a binary operation from scratch. Some binary operations are not associative, 
but they are indeed rare. 

Many, but not all, of the algebraic structures we’ll consider will have a binary oper- 
ation that is commutative. Some very interesting results of algebra derive from binary 
operations that are not commutative. Be careful in your work! Unless commutativity is 
explicitly given or proven, you might be tempted to reverse the order of elements without 
any permission to do so. 

The existence of an identity element for * and inverses for elements is rather context 
specific. We insist in Definition 6.1.3 that an identity element must commute with every 
element of the set, regardless of whether the binary operation is commutative. 
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Definition 6.1.3. Suppose S is endowed with binary operation *. Then e € S is called an 
identity for the operation *ifa*e =exa=aforallae S. 


If there is an identity element in S for the binary operation «, then it might be that 
some or all elements have an inverse under *. 


Definition 6.1.4. Suppose S has binary operation *, for which there is an identity ele- 


ment e. If fora givena € S there exists b € Ssuchthata *b = bx a = e, then we 


say that b is an inverse of a, and we write itasa~!. 


For convenience we sometimes list the features of an algebraic structure as an or- 
dered list. For example, if S is a set with binary operation *, identity e, and with the 
feature that every a € S has an inverse a~!, then we might write such a structure as 
S55"); 

Of the basic algebraic properties we assumed on R (A2—A14), the only one we 
have not mentioned here is the distributive property Al4. If a set is endowed with 
two binary operations, it might be that they are linked in their behavior by the dis- 
tributive property. We'll address algebraic structures with two binary operations in 
Chapter 7. 

If S is a small finite set, it might be convenient to describe the binary operation in 
what is called a Cayley table. Since a binary operation might not be commutative, it’s 
important to read a x b from a Cayley table by going down the left column to find a and 
across the top to find b. 


Example 6.1.5. Consider S = {0, 1, 2, 3, 4, 5} and let @ be described as in Table 6.1: 


(6.1) 


COMB WN HE] 
FP ONnNskWNI]N 
NK ON BA W]W 
WNrF ON AIA 
BWNF ONIN 


AkwWNH O1|® 
nABWN FR O]O 


The fact that © is well defined is immediate from the Cayley table, for there is a unique 
value in each position in the table. Closure is obvious, for every entry is an element of S. 
Notice © is commutative. How can you tell?! Is there an identity element?” Does every 
element have an inverse under 623 One way to verify associativity would be to work 
through every possible calculation. In Section 6.3, we'll construct this algebraic structure 
formally and prove associativity as a theorem. 


‘Look for a diagonal symmetry. 
2Yes, 0. 
3¥es.07! = 0,17! = 5,27! = 4, and so on. 
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Example 6.1.6. Let S be as in Example 6.1.5 and define the operation @ as in Table 6.2: 


@®|0 12 3 4°55 
0/0 0 0 0 0 0 
1/0 12 3 4 5 
2}0 2 4 0 2 4 (6.2) 
3/0 3 0 3 0 3 
4/0 4 2 0 4 2 
510 5 43 2 1 


Is ® commutative? Is there an identity element?> Does every element have an 
inverse?° 


6.1.2 Groups defined 


We have different names to refer to algebraic structures with different features. Our first 
one is the following. 


Definition 6.1.7. Suppose G is a set with associative binary operation x, identity element 
e, and with the property that every g € Ghasan inverse g! under *. Then the algebraic struc- 
ture (G, *, e,~!) is called agroup. If * is a commutative binary operation, then G is called an 
abelian group (after the mathematician Niels Henrik Abel (1802-1829). If |G| € N, Gis said 
to be finite, and|G]is called the order of the group (sometimes denoted o(G) instead of |G]). 


According to Definition 6.1.7, a group has the following defining features: 
(G1) the operation « is well defined; 
(G2) the operation « is closed; 
(G3) the operation « is associative; 
(G4) there is an identity element e; and 
(G5) every element of G has an inverse under x. 
Example 6.1.8. The real numbers with binary operation addition, identity zero, and 
additive inverses form an abelian group: (R, +, 0, —). Note that G1 is property A2, G2 


is A3, G3 is A4, G4 is A6, and G5 is A7. That R is abelian is property A5. Also, Q and Z 
are abelian groups under addition. 


Example 6.1.9. The nonzero real numbers R\{0} with multiplication form an abelian 
group: (R\{0}, x, 1,~!). Also, Q\{0}, R* and Q* are abelian groups under multiplica- 
tion. 


4Yes, 
5 Yes, 1. 
6No, only 1 and 5. 
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Example 6.1.10. The algebraic structure (S, ©, 0, —) from Example 6.1.5 is an abelian 
group, while (S, ®, 1,~!) from Example 6.1.6 is not a group, because some elements do 
not have inverses. 


Example 6.1.11. Here is another example of a finite abelian group. In Exercise 9 from 


Section 2.3, you observed that x? = —1 has no solution for x € R. Nothing prevents us 
from creating a symbol, say, i, declaring i? = —1, and noting that i ¢ R. Now consider 
the set S = {+1, +7} with binary operation x defined according to Table 6.3. Then 
(S, x, 1,7!) is an abelian group. 

ot ee i -i 

1 1 -1 i -i 


—1/}-1 1 -i i (6.3) 


Example 6.1.12. Here’s an example that is similar to Example 6.1.11 on the set Q = 
{+1, +1, +), +k}. See Table 6.4: 


Ai oe a er es ee 
Cel. ein ge Sh RS 


= | an’ ee ar a ae a oo} | (6.4) 
S| ye ge ey alt ES 
Spee. gh ee ee ed. od 
Kell “des ake a aR Ca eth; A 
Allee) a kt Ss Se | 


The gist of this algebraic structure can be understood by noticing that i = j? = 
k? = —1, buti, j, and k do not commute with each other. If you think of the letters 7, 
j, and k being written on the face of a clock at 12, 4, and 8 o’clock, respectively, then 
multiplication of two different elements in the clockwise direction produces the third. 
Multiplication of two elements in the counterclockwise direction produces the negative 
of the third. For example, j x kK =i andi x k = —j. These three square roots of —1 are 
called quaternions, and they motivate a group on eight elements. 

If someone gives you a set with a binary operation and asks you to show that it’s a 
group, you must verify properties G1-G5. To give you an idea of how that might look, 
we're going to walk through most of the details of a specific example now. 

Define the complex numbers by 


C= {a+bi:a,be R}. (6.5) 


We should note immediately that i is a mere symbol with no meaning at this point. We 
should think of C merely as a set of ordered real number pairs, the first of which stands 
alone, and the second of which is tagged with an adjacent i. 

Next, we define equality in C, which for clarity we temporarily denote =c. We define 
a, + bhi =c ay + boi provided aj =p az and db; =p bo. Notice how =c is defined 
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in terms of =p, which we assume is an equivalence relation. To show that =c is an 
equivalence relation is pretty trivial. For example, to show =c has property E2, suppose 
a, + byi =c ay + boi. Then ay =p az and bj =p bp. Calling on property E2 in R, 
a) =p a, and by =p Dy. Thus, az + boi =c ay + di. 

Define the binary operation @ in the following way: 


(a+ bi) @ (c+ di) =(ato+(64+d)i. (6.6) 


We claim that C is an abelian group under ®. Here are the steps of the proof in meticulous 
detail. Working through them will give you a good sense of direction in Exercise 1, where 
youll show that C\{0 + 07} with a form of multiplication is a group. 


(G1) Suppose a; +b1i =c ao+boi andc;+dii =c co+dzi. Thena; =p az, b; =p do, 
C] =R C2, and d; =p d>. Since addition in R is well defined, a, + cy =p ax + €2 
and bj +d, =p bop + do. Thus 

(a t+hi)@(atdi) 2 @+e)+bi4+d)i 
=c (dy + €2) + (bo + do)i (6.7) 
2 (a + boi) @ (Cr + chi). 


(G2) Pick a + bi,c + di € C. Since R is closed under addition, a + c,b+d € R, so 
that (a + bi) @ (c+ di) =(a+c)+(b+d)i €C. 


(G3) 
[(a + bi) ® (c + di)] @ (e+ fi) = [a+c) + (6+ d)i] O (e+ fi) 
=[(at+c)+e]+[+d)+ fli 
=[a+(c+e)]+[b+ d+ fli (6.8) 
=(a+bi) @l(ct+e)+ (d+ fil 
= (a+ bi) @[(c+ di) @ (e+ fi)]. 


(G4) Picka + bi € C. Then (a + bi) @ (04 O71) = (4+ 0) 4+ (b+ 0)i = a+ bi, so 
0 + Oi functions as an additive identity in C. 


(G5) Picka + bi. Then a, b € R, so that —a, —b € R also. Thus (—a) + (—b)i € C, 
and (a + bi) @ [(—a) + (—b)i] = (a-—a) + (b-— bi =04+ 01. 


Finally, (C, 6, 0 + Oi, —) is abelian because 
(a+ bi) ®(c+di) =(at+co)+ (b+d)i 
=(ct+ta)+(d+b)i (6.9) 
= (c+di) @(a+bi). 


In Exercises 6 and 7 from Section 2.4, we defined a” for a € R\{0} andn € Z and 
proved the rules for exponents 


a™.q"™ =a" (6.10) 
(a”)" =a (6.11) 
(ab)" =a"b". (6.12) 
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In the context of an arbitrary group G with binary operation *, we can now fix anya € G 
and make the same definitions for a”, where we write 


a° =e, (6.13) 
a"*}=a"xa_ forn>0, (6.14) 
a" =(a°')": (6.15) 


Furthermore, by mimicking exactly the proofs from these exercises done in the context 
of R\{0}, we arrive at similar exponent rules for * on G, except that one rule depends 
on G being abelian. 


Theorem 6.1.13. IfG is a group, a,b € G, andn € Z, then 


a™ xa” = a™t" (6.16) 
(a”)" =a". (6.17) 

Furthermore, if G is abelian, 
(a* b)" =a" xb". (6.18) 


6.1.3 Subgroups 

—SSSSz 

You might have noticed that S from Example 6.1.11 is a subset of Q from Example 6.1.12. 
More than that, the binary operation on S is the same as that on Q, in the sense that 
Table 6.3 is a subtable of Table 6.4. Thus, S is closed under x, contains 1, and is closed 
under inverses. We give such a subset a name. 


Definition 6.1.14. Suppose (G, *, e,') is a group, and H C Gis itself a group under 
the same operation. Then (H, e, +, ) is called a subgroup of G, and we write H <G. If 
H CG, then H is called a proper subgroup of G. 


If we are given a group (G, *, e,-') and a set H C G, and are asked to show that 
H < G,wemustshow H is itselfa group under «. Properties G1 and G2 are automatically 
satisfied on H, for if * is well defined and associative on all of G, then certainly it is well 
defined and associative when restricted to H. We say that H inherits these properties 
from G. We must demonstrate the following: 


(H1) the operation * is closed on H; 
(H2) the identity element e is in H; and 


(H3) the inverse of every element of H is also in H; that is, H is closed under malt 


Example 6.1.15. For any group G, {e} and G themselves satisfy properties H1—H3, and 
are therefore subgroups of G. We call {e} the trivial subgroup. 
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Example 6.1.16. The set of even integers is a subgroup of (Z, +, 0, —) since it is closed 
under addition, contains 0, and is closed under negation. 


Convenience is desirable if it costs nothing in clarity. For this reason, we'll often refer 
to a group (G, *, e, ') simply as the group G, and we'll sometimes use juxtaposition 
of terms ab to indicate the binary operation a * b, just like we do with multiplication 
in R. If we’re working with two groups G and H, we'll probably be more careful at first 
to distinguish between the symbols for the binary operations, and we might denote the 
identity elements as eg and ey, respectively, just to be clear. 

Suppose H <G and H C AH; CG C G,. We can make the following observations 
about relationships between these sets. First, if H; is a group under «, then the fact that 
H <G implies H < H, also. For clearly, if H exhibits properties H1—-H3 as a subset of 
G, it also does as a subset of H. Similarly, if G; is a group under *, then H < G,. The 
most efficient way to state this latter relationship is to say that < is transitive: If H <G 
and G < G;, then H <G,. 


EXERCISES 


1. Define a form of multiplication ® on C\{0 + 0i} by 
(a+ bi) @ (c+ di) = (ac — bd) + (ad + be)i. (6.19) 


Show that C\{0 + 07} is an abelian group under @.7° 


2. Define a binary operation * on Zbya*b=a+b +1. Prove that Z with « forms 
an abelian group. 


3. Ifwe define a binary operation * bya*b = a+b —ab, then there exists xy € Rsuch 
that R\{xo} is a group under x. Find xo, and show that R\{xo} with * is an abelian 
group. 

4. Suppose (G, *, e,!) isa group. 

(a) Show that the identity element is unique. 
(b) Prove the left and right cancellation laws: 
i. Ifcxa=cxb,thena=b. 
ii. Ifaxc=bxc,thena=b. 
(c) Show that the inverse of a € G is unique. 
(d) Show that (a * b)~! = b-! x a7}. 


(e) Show that (a; * a2 * +++ * dn)! =a, : 


Po f= - 
*A, {#0 AY. 


5. Find all subgroups of (S, ©, 0, —) from Example 6.1.5. 


7Showing closure of ® involves verifying that the product is never 0 + Oi. Prove contrapositively by 
showing that if (a + bi) ® (c+ di) = 0+ Oi, then either a = b = 0 orc =d = 0, s0 that either 
a+ bi orc +di is not an element of C\{0}. 

8If (a + bi) ® (c + di) = 04 Oi, then ac — bd = O and ad + bc = 0. Square these and add. 


10. 


11. 
12. 


6.2 Generated and Cyclic Subgroups 


. From the multiplicative group C\{0 + 07} in Exercise 1, let 


H={a+bieC:a+4+b? =I}. (6.20) 


Show that H < C\{0+ Oi}. 


. LetG={at+ b/2:a,b€ Q, aand b not both zero}. Clearly, G C R, and by Exer- 


cise 5 from Section 2.8, a + b/2 = c + dv2 if and only if a = c and b = d. We 
want to show that G is a subgroup of (R\{0}, x, 1,~'). 


(a) Show0 ¢ G. 


(b) Show that G is a subgroup of (R\{0}, x, [i *) by showing it has properties 
H1-H3. 


. Define the center of a group G to be the set of all elements that commute with all 


elements of G. That is, the center is 
C={aeG:axx =x xa forall x € G}. (6.21) 


Show that the center of G is a subgroup of G. 


. Suppose {Hy }uca is a family of subgroups of a group G. Show that 


() Hy < G. (6.22) 


acA 


Suppose {Hy}aca is a family of subgroups of a group G. Is it true that 


U Hy < G? (6.23) 


acA 
Prove or give a counterexample. 
Suppose G is a group such that a * a = e for all a € G. Show that G is abelian.? 


Let G be a group, and fix some g € G. Define f:G > G by f(x) = g * x forall 
x € G. Show that f is a one-to-one function from G onto G. 


6.2 Generated and Cyclic Subgroups 


From the quaternion group Q in Example 6.1.12, let’s take some subset of Q, say, A = 
{i, j}. Although A C Q, A is clearly not a subgroup of Q. However, we can talk about 
the smallest subgroup of Q that contains all elements of A. In this section, we address 
the existence and possible uniqueness of what we call the subgroup generated by A. Then 
we'll look specifically at the special case of a subgroup generated by a single element. 
Finally, we'll look at some properties of groups with the special feature that they can be 
generated in their entirety by some single element. 


9 


a*a =e is equivalent to a = a~!. Use Exercise 4d. 
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6.2.1 Subgroup generated by A C G 


Let’s begin by defining the term that conceptually means the smallest subgroup containing 
all elements of A C G. 


Definition 6.2.1. Suppose G isa group and A C Gis nonempty. Suppose also that H C 
G has the following properties: 


(Ul) AC H; 
(U2) H < G;and 
(U3) If B < GandA C B,thenH CB. 


Then H is called a subgroup generated by A, and is denoted (A). 


Exactly as in Definition 2.7.12, where we defined gcd(a, b), we note that Defini- 
tion 6.2.1 is merely a definition, and does nothing to guarantee the existence of such a 
subgroup. However, as a definition it does what it’s supposed to do—it describes what 
we would call a smallest subgroup containing all elements of A. Notice how properties 
U1-U3 do just that. Property U1 guarantees that any H we might be tempted to call 
(A) does in fact contain all elements of A, and property U2 guarantees that it is indeed 
a subgroup of G. Property U3 guarantees that no other subgroup of G that contains all 
elements of A will be any smaller. The fact that (A) exists uniquely is guaranteed by the 
following theorem, which gives us one way to visualize its construction. 


Theorem 6.2.2. Let G be a group, and suppose AC G is nonempty. Then (A) exists 
uniquely. In fact, 


(A)=() J. (6.24) 
J<G 
JDA 
Theorem 6.2.2 states that a way to construct (A) is to collect into a family all the 
subgroups of G that are supersets of A, and then take the intersection across this family. 
An important question to ask about the construction in Eq. (6.24) is whether there even 
exist any subgroups J to use in the intersection. If there are no subgroups of G that 
contain all elements of A, then Eq. (4.10) is a vacuous construction. However, since 
G < GandA CG, this is not the case. You'll prove that the construction in Eq. (6.24) 
satisfies properties U1—U3 and that (A) is unique in Exercise 1. 
Equation (6.24) might not actually be the way you would construct (A) for a given 
A CG. It might be easier to begin with the elements of A and build up (A) by tossing 
in the needed elements of G until youre sure you're finished. 


Example 6.2.3. From Example 6.1.12, and A = {i, j}, determine (A). 


Solution: Clearly, (A) must contain 1, and by property H1, closure of x, it must 
contain i? = —1 andi j = k. Continuing, —i, —j, —k € (A) also. Thus, (A) = Q. 
| 


6.2 Generated and Cyclic Subgroups 


Constructing (A) in the manner of Example 6.2.3 might be a bit complicated, espe- 
cially if G is infinite and not abelian, so be careful. 


6.2.2 Cyclic subgroups 


If A = {a} contains only one element, we usually write (a) instead of ({a}). In this case, 
we call (a) the subgroup of G generated by a, and a is called a generator. A subgroup that 
has a generator is called a cyclic subgroup. The easiest way to see what (a) looks like is to 
build it from the bottom up, so to speak, and then demonstrate that what you've created 
satisfies U1-U3. Using the definition of a” from Section 6.1 (page 212), consider the set 


S={a":neZ}. (6.25) 


The claim is that S=(a), which we'll verify here by showing S has properties 
U1-U3, thereby earning the right to be called (a). The details will help you in prov- 
ing Theorem 6.2.9. 

Letn = 1 toseethata € S,so that S has property U1. To show that S has property U2, 
we must show that it has properties H1—H3: 


(H1) Pick x, y € S. Then there exist m,n € Z such that x = a” and y = a”. Now 
m+n eZ, so that a” xa" = a™t" € §S. Thus, S is closed under x. 


(H2) Letting n = 0, we have thate = a € S. 


(H3) Picka” € S. Since Theorem 6.1.13 applies for all m,n € Z, we have that (a”)~! = 
a" € S. Thus, S is closed under inverses. 


To show that S has property U3, suppose B < G and {a} C B. We show S C B by 
showing a” € B for alln € Z. Since a € S and B is closed under x, it must be that 
a" € Bforalln € N.Certainlya® = e € B,andsince B is closed under inverses, a~" € B 
for alln € N. Thus, S C B, and we have finished the proof that (a) = {a" :n € Z}. 
Example 6.2.4. In (R*,-, 1,71), 

(3) = {3" :neZ}={...,1/9, 1/3, 1,3,9,...}. 
Example 6.2.5. To construct (3) in (Z, +, 0, —), note that zero is the identity in this 


context, so that 3° = 0. Constructing 3” for n € N is repeated addition, not repeated 
multiplication. Thus 


31=3, 37=343=6, 37? =64+3=9, 34=94+3=12, andsoon. (6.26) 
Constructing 3~” involves negation, not reciprocation, so that 
371 = -3, 
3-7 = (3°')? = [(-3) + (-3)] = —6, (6.27) 
3-3 = 37!) = [(-6) + (-3)] = —9, and so on. 
Thus, (3) = {3k : k € Z}. 
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Example 6.2.5 suggests that it might be worthwhile to rewrite the exponent definitions 
in Eqs. (6.13)—(6.15) and the exponent rules in Eqs. (6.16)—(6.18) in what we call their 
additive form. In particular, if (G, +, 0, —) isa group anda ¢€ G, the fact that the binary 
operation is a form of addition suggests that, instead of writing a® = e, which seems to 
connote multiplication, we instead write 


0a = 0. (6.28) 


We must be careful, though, for the zero on the left-hand side of Eq. (6.28) is an integer, 
and the zero on the right-hand side is the identity element of the group. By saying 0a = 0, 
we mean that the group element a is, in a sense, added to itself zero times, not multiplied 
by itself, and this is to be defined as the zero of the group. Similarly, the additive form of 
Eq. (6.16) would be 


ma+na=(m-+n)a. (6.29) 


You'll write the additive forms of the other exponent definitions and rules in Exercise 2. 

The subgroup generated by a€G might or might not be an infinite set. In 
Example 6.2.4, the subgroup generated by 3 is an infinite cyclic subgroup. However, 
in (C\{0 + Oi}, x, 1 + 07 ,—!), the subgroup generated by i has a different look. First, 
i? = 1,i! =i, i? = —1, andi? = —i. But i* = 1, and 4 is the smallest positive power 
of i for which this is true. Furthermore, for anyn € Z,n = 4k +7 where 0 <r < 3,s0 
that i”? = i+" = (i4)* . i” = i”. Thus every power of i is an element of {i°, i!, i, i*}, 
and we’ve shown that 


HSC re a ay HS fe ek a, (6.30) 


so that (7) is a finite cyclic subgroup in an infinite group. 

This suggests a general result for some cyclic subgroups. Suppose G is a group, 
and a € G. Suppose there exists some k € Z such that a = e. Since a~* would also be 
the identity, we may assume k € N. Ofall values of k € N for which a‘ = e, let n be the 
smallest, and consider the set 


TSia aac (6.31) 


Since n is the smallest natural number for which a” = e,noa* = e forl<k<n-—1. 
Thus e appears exactly once in Eq. (6.31) as a°. More than that, all elements of T are 
distinct (Exercise 3). Clearly, {a* :0 < k <n—1} C {a* : k € Z}, so that T C (a). In 
Exercise 3, youll show T > (a) to have T = (a). 

If n is the smallest natural number for which a” = e, we say that the element a has 
order n, and we denote the order of a by o(a). Ifa" ¥ e for alln € N, we say that a has 
infinite order. In Definition 6.1.7, we defined the order of a group as its cardinality. Here, 
we're defining the order ofan element ofa group in terms ofits powers. By constructing (a) 
as weve done here, we’ve demonstrated that these two uses of the term order are tied 
together. 


Theorem 6.2.6. IfG is a group anda € G has order n, then the subgroup generated by a 
has order n. 


Naturally, if G is finite and a € G, the subgroup generated by a will be finite. 


6.2 Generated and Cyclic Subgroups 


Example 6.2.7. For (Q, x, 1,~!) from Example 6.1.12, 
()={7" :0<n<3}={L, j,-1, —-J}, 
and j has order 4 in a group of eight elements. 


In Exercises 6 and 7, you'll prove the following. 


Theorem 6.2.8. IfG is a group anda € G, then (a) is abelian. 
Theorem 6.2.9. Suppose G is an abelian group, anda, b € G. Then 
({a, b}) = {a"b" :m,n € Z}. (6.32) 


If G is a group and there exists some g € G such that (g) = G, then G is said to be 
cyclic, and g is called a generator. (See Exercise 13 for examples.) There are some nifty little 
theorems about cyclic groups. By Theorem 6.2.8, a cyclic group is abelian. We'll point 
out one more characteristic of cyclic groups in Exercise 12, but this and other results will 
become pretty transparent after we discuss morphisms in Section 6.5. 


EXERCISES 


1. Prove Theorem 6.2.2 by showing that the construction in Eq. (6.24) satisfies prop- 
erties U1—-U3, and that any two subgroups H; and H) that both satisfy U1-U3 must 
be equal. 


2. Let (G,+,0, —) be an additive group. Write Eqs. (6.13)—(6.18) in their additive 
form. 


3. Let G bea group, anda € G have the property that a” = e, where n is the smallest 
such natural number. Define T C G as in Eq. (6.31). 


(a) Show that the elements of T are distinct. That is, if0 < k <1 <n-—1, then 
ak £ a’, 
(b) Show T 2D (a) to have T = (a).!° 
4. Suppose x* = e for some k € N. Show that 0(x) | k. 


5. For each of the following groups, determine the subgroup generated by the given 
element. 
(ay Ze Oy) 25 
(b) (R\{O}, x, 1,7')35 
(c) GS, @, 0, —) from Example 6.1.5; 5 
(d) (S, ®, 0, —) from Example 6.1.5; 2 
(e) (S, @, 0, —) from Example 6.1.5; 3 
(f) (S, @, 0, —) from Example 6.1.5; 0 


10Pick a* € (a) and apply the division algorithm to k and n. 
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10. 


11. 


12. 
13. 


(g) (Q, x, 1,7!) from Example 6.1.12; j 
(h) (C\{0}, x, 1,71)3z = 1/V2 + (1/V2)i 


. Prove Theorem 6.2.8: If G is a group and a € G, then (a) is abelian. 


. Prove Theorem 6.2.9: Suppose G is an abelian group, and a, b € G. Then 


({a, b}) = {a'"b" : m,n € Z}. (6.33) 


. Write Eq. (6.33) in its additive form. 
. For the group (Z, +, 0, —), describe the following: 


(a) 4) ©) 

(b) (10) N (3) 

(c) (8) (16) 

(d) ({4, 6}) 

(e) 4, 7) 

Consider the group (Z, +, 0, —), and let a, b € Z\{0}. Write g = gcd(a, b). Show 
that ({a, b}) = (g). 

Suppose G is a group, a € G, let m,n € N, and suppose gcd(m, n) = g. Show that 
({a", a"}) = (a8). 

Show that a cyclic group is countable. 

Are the following groups cyclic? Explain. 

(a) Z,+,0,-) 

(b) (S, @, 0, —) from Example 6.1.5 

(c) (R\{O}, x, 1,71) 

(d) (Q, x, 1,7!) from Example 6.1.12 

(e) (Q, +, 0, -) 


— 6.3 Integers Modulo n and Quotient Groups 


In this section we return to Example 6.1.5 to derive it in a formal way from the integers. 
The result of this construction is called the integers modulo n (n = 6 for Example 6.1.5). 
This construction serves as a good illustration of what is called a quotient group, which 
we will discuss as a generalization of the process of deriving the integers modulo n. 


6.3.1 Integers modulo n 


In Example 6.1.5, we noted that (S$, ©, 0, —) is a group, where S = {0, 1, 2,3, 4, 5} 
and © was defined by Cayley Table 6.1. You might have noticed the similarity of © to 
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regular addition of integers, except that it was a sort of circular addition. Any sum such 
as 4 + 5 that exceeded 5 in Z was reduced by 6 to guarantee that the sum is in S. This 
circular summing is sometimes called clock arithmetic, where in this case, the numbers 
{0, 1, 2,3, 4, 5} can be written around the perimeter of a clock and addition performed 
in a natural circular way. This example of a group might sound a bit simplistic, but 
actually, in spite of its simplicity, it is a most important example in group theory. Perhaps 
we should say because of its simplicity its importance in the study of group theory is 
particularly striking and beautiful. 

Here is a standard way of constructing this group by beginning with the integers. It 
takes a few steps and might seem a bit esoteric, but it’s representative of a standard pro- 
cedure. Study it carefully. Instead of considering the special case of S = {0, 1, 2, 3, 4, 5}, 
we consider the general case {0, 1, 2,..., — 1}. 


Step 1. Construct elements of the set. Start with the group (Z, +, 0, —), and let 
n € N be given. Let 


(n) = {kn:k € Z} = {..., —3n, —2n, —n, 0,n, 2n, 3n, ...} (6.34) 


be the subgroup of Z generated by n. Define an equivalence relation on Z as follows. 
Define 


x=, y ifandonlyif x—ye(n). (6.35) 


Thatis,x =, yifandonlyifx—y = kn forsomek € Z. This is the same definition of 
equivalence that we used in Example 2.5.5, so proving =, is an equivalence relation 
on Z has already been done. Recall from Exercise 6 in Section 2.7 that x =, y if 
and only if x and y have the same remainder when divided by n. Also, recall the 
equivalence classes generated by =,: 


[0] = {...,—2n, —n, 0, n,2n, «4 


BN eae eee eg ene ee Ree eI 
Ps ee) a eee ay ya ae ee 

(6.36) 
[kK] ={...,-2n tk, —n+k, k,n +k, 2n+k,..J 


[n—1])={..., -n—-—1,-1,n—1,2n—1,3n—1,...}. 


Notice the very important fact that [0] = (7); that is, the equivalence class of the 
identity in (Z, +, 0, —) is the subgroup from which the equivalence is defined. Lump 
these n equivalence classes into a family and call it Z/(n), the integers modulo n. 


Z/(n) = {[0], [1], [2], ..., [nm — 1]}. (6.37) 


Recall that every equivalence class has infinitely many names depending on the 
representative element by which we choose to address it. For example, in Z/(6), 
[5] = [-1] = [41]. 
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Step 2. Define a binary operation on the created set. On the elements of Z/(n), which 
are themselves sets, define a form of addition @, in the following way: 


[a] ®, [b] = [a +z 5]. (6.38) 


For example, [4] @¢ [5] = [4+ 5] = [9] = [3], where we choose to refer to the sum 
as [3] since 3 is a representative element of [9] between 0 and 5. 

Notice this new binary operation @, is a way of combining two sets to produce 
another set. It’s not union or intersection, which up to now are the only binary 
operations on sets we’ve seen. Instead, @, combines [a] and [b] by using represen- 
tative elements from each to produce a representative element of the set we’re calling 
[a] ®p [P]. 


Step 3. Show that the created set with its binary operation gives rise to a group. We 
must show that (Z/(n), ®,) is a group by showing @,, is well defined, closed, and 
associative. We must also find an identity element and find inverses for all elements. 
Clearly, ®, is closed. For if [a], [b] € Z/(n), then a + b € Z. Since the equivalence 
classes in Z/(n) partition Z, every integer is in some equivalence class. In particular, 
[a+b] € Z/(n). In Exercise 1, you'll show the rest of the required properties. Showing 
®y is well defined is a great example of how one element of a set can have more than 
one name, and how the resulting ambiguity must not affect how the sums work. 


Now we're done. The following Cayley Table 6.39 displays the final product for the 
particular case Z/(6): 


®, | 10] 1] [2] [3] [41 6] 
[0] | [0] [1] [2] [3] [41 [6] 
[1] } (1) [2] [3] [4] [5] [0] 
[2] |} [2] [3] [4] [5] [0] [1] (6.39) 
[3] | [3] [4] [5] [0] [1] [2] 
[4] | [4] [5] [0] [1] [2] [3] 
[5] | [5] [0] [1] [2] [3] [4] 
Take a closer look at the equivalence classes [0], [1], ..., [7 — 1], for there is another 


standard notation for these sets that we'll use when we derive quotient groups in general. 
Although we know from Eqs. (6.36) what is in each equivalence class, let’s consider a 
way to visualize an arbitrary [k] in terms of the subgroup of the integers (7) = [0] that 
motivated this whole mess in the first place. Picture all the integers on the number line 
in standard fashion, and pretend that each integer is like a key on a piano keyboard. Take 
countably infinitely many of your friends, and each of you place a finger on the elements of 
(n) = [0], so that you are pointing to all the multiples of: {..., —2n, —n,0,n, 2n,...}. 
Now suppose you want to point to all the elements of [k]. How can you do it? Everyone 
in unison should lift his/her finger off the keyboard, and everyone should shuffle over to 
the right k units, then put his/her finger back down. In other words, to generate all the 
elements of [k], take all the elements of (m) and add k to each one, so that you’re sort of 
translating each element of (n) through the set of integers by k units. Here’s yet another 
way to say it: 


[A] ={x+k:x € (n)}. (6.40) 
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Let’s create new notation for the construction in Eq. (6.40), writing 
(ny)tk={x+k:xeEe(M}, (6.41) 


where we’re slightly abusing orthodox notation by apparently combining the subgroup 
(n) with an integer k by using the binary operation defined for Z. This notation is 
standard, however, and it makes the definition of addition in Eq. (6.38) look like this: 


[(1) + a] On (2) +b] = () + [a + 5]. (6.42) 


With this notation for the equivalence classes in Z/(n), the Cayley Table, written in its 
painfully rigorous form, would look like Table 6.43 for the case n = 6. 

Now let’s relax the notation for the specific context of Z/(n). First, Z/(n) is usually 
written Z,. Also, instead of always writing (n) + k, we allow ourselves simply to write k, 
understanding that we are not talking about a single integer k, but the entire equivalence 
class of integers of which k is a representative element. 


On (nh) +0 (n)+1 (1) +2 (N)+3 (2) +4 (n)4+5 
(n)+0](n)+0 (+1 (+2 (+3 (+4 (Nn) +5 
(ny+1](n)+1 (+2 (+3 (+4 (X)4+5 (rn) +0 
(n)+2}(n)+2 (n) +3 (+4 (*)4+5 (2) +0 (n)+1 ~~ (6.43) 
(n)+3] (n)+3 (n)t+4 (X)+5 (X)+0 (n)+1 (n)4+2 
(ny+4}(n)+4 (Wt+5 (+0 (stl (+2 (n)4+3 
(yyb+51()+5 (+0 (A)4+1 M42 ()4+3 +4 


We also generally revert back to the regular addition sign + rather than always writing ®,. 
What this boils down to is that mathematicians get a little lazy and let the familiar notation 
from (Z, +, 0, —) also serve for the group Z/(n) with addition, so that (Z,, +, 0, —) isa 
relaxed notation for (Z/(n), ®n, (n) + 0, —). The reason this is probably all right is that 
we are less interested in fancy notation than we are in the internal structure of Z, as a 
group. And it’s just as helpful to visualize Z,, as clock arithmetic with regular old numbers 
as it is to think in terms of combining equivalence classes of integers. Understood this 
way, we can then write something like 17 + 9 =¢ 2 and know just what we mean. 


6.35.2 Quotient groups 

—EEE 

The derivation of Z,, is affected by the fact that + is a commutative operation on Z. If 
you've already done Exercise 1, did you notice at what point you exploited this fact?!! 
Now let’s generalize the program for deriving Z, to create a quotient group in a more 
abstract setting. At first, we’re going to restrict ourselves to an abelian group (which we 
shouldn’t have to do). By doing this, we avoid an obstacle in showing that the binary 
operation on the quotient group is well defined. Then in Section 6.4 we'll address the 
non-abelian case. For now, simply try not to use the fact that G is abelian anywhere it’s 
not absolutely necessary, and take note of the step(s) in the proof where you do use it. 
The proofs of all the steps along the way in this derivation are left to you in Exercise 2. 
As you read them, refer back to their parallel steps in Z,,, and note how the new notation 
here is a generalization of the Z, notation to an arbitrary group. 


'l Somewhere in showing ©, is well defined. 
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Step 1. Given an abelian group (G, *, e,') and a subgroup H <G, we define an 
equivalence relation =y on G in the following way. Define 


a=ybaaxb' EH. (6.44) 


First, we would need to show that this definition of equivalence is in fact an equiv- 
alence relation, so that G is partitioned into equivalence classes. Call the set of 
equivalence classes G/H, read “G mod H.” 


Before we complete the construction of the group G/H by defining its binary op- 
eration, let’s see how the equivalence classes generated by 6.44 look. In Z,, we have a 
pretty clear picture of what is in each equivalence class, as illustrated by Eqs. (6.36). Ina 
general group, though, we want a way to visualize the elements of [g] for a given g € G. 
Therefore we ask what the statement x € [g] means. In Exercise 2, you'll show that 


Ig] ={h*xg:he HA}; (6.45) 


that is, youll show x € [g] if and only if there exists some h € H such that x = h x g. 
You can think of this as saying that x is an element of G that is scooted over by * exactly g 
amount from some element of H. The set in Eq. (6.45) is called the right coset of H 
generated by g, and notationally we write 


Hxg={hxg:heH}. (6.46) 


A coset H x g is strikingly similar to that from Eq. (6.40), and can be visualized in a 
similar way as a translation of all elements of H through G by g units, so to speak. 
Writing G/H = {H « g: g € G}, the set of all cosets of H, we’re ready to continue the 
construction of the quotient group by defining a binary operation. 


Step 2. Define a binary operation *4 on G/H in the following way: 
(H xa) *y (H *b) = Hx (axb). (6.47) 


Step 3. Show that *y is a well-defined, closed, associative binary operation on G/H, 
find the identity element, and find inverses for elements. 


Unless you're unnecessarily sloppy with your algebraic manipulation in completing 
steps 1-3, there is only one place where you must exploit the fact that G is abelian, and 
it is the same place where you exploited commutativity of + on Z in your work with @,. 
In Section 6.4, we'll return to this construction in the context of an arbitrary group that 
might not be abelian. However, even then, if the program is going to work, we cannot 
completely do without some way to switch the order of certain elements. In Section 6.4, 
we'll guarantee the property we need by requiring H to be a special kind of subgroup of 
G called a normal subgroup. 

The set of all cosets with *;; forms the new group, which we call the quotient group 
generated by H. Here’s the definition in its entirety. 


Definition 6.3.1. Suppose (G, *, e,!) isa group (abelian), andlet H < G. fora, be 
G, define an equivalence relation =y by declaringa =y Difandonlyifa*b | € H.Then 


G/H ={Hxg:g€EG} (6.48) 
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with binary operation * 7 defined by 
(H * a) *y (H *«b) = H x (a xb), (6.49) 


identity element H +e = H, and inverse (H *a)~! = H «a7! is called the quotient group 
of G created by modding out the subgroup H. Elements of G/H are called right cosets of H. 


For equivalence classes in general, we know that [a] = [b] if and only ifa = b; 
otherwise, [a] M [b] = @. In the context of Definition 6.3.1, these facts become H * a = 
H x b if and only if a * b—! € H; otherwise, H x aM H «x b = G. In other words, a and 
b generate the same coset of H if and only if a * b-! € H; otherwise, the cosets they 
generate are disjoint. 


Example 6.3.2. Let G be the set ofall functions f : R— R. Define f +g in the following 
way. Forx € R,let[ f+g](x) = f(x)+ g(x). Clearly, the function 0 defined by 0(x) = 0 
forallx € Risthe identity element, and — f, defined by [— f](x) = — f (x), isthe additive 
inverse of f € G. Thus (G, +, 0, —) is a group. Let H C G be the set of all constant 
functions, which is clearly a subgroup of G. 


What do the elements of the quotient group G/#H look like? If f € G is given, then 
H+ f ={h+ f :h € Hy} isthe set ofall translations of f up and down in the plane by 
the constant functions in H. That is, g € H + f if there exists some constant function 
h € H such that g = h + f. This might remind you of a fact from calculus. If f is an 
antiderivative of a function f|, then H + f is the set of all the antiderivatives of f;. 


6.3.3 Cosets and Lagrange's theorem 

 —$ —— | 

Even if G is not an abelian group and the family of cosets of a subgroup of G do not give 
rise to a quotient group, we can still derive some important results about elements and 
subgroups of G by looking at the cosets of the subgroup. One thing to keep in mind if 
G is not abelian is that we have to distinguish between left and right cosets. For H <G 
and some fixed g € G, we use the notation 


gxH=gH=({gxh:he HA} (6.50) 
Hxg=Hg={hxg:heH} (6.51) 


to denote the left and right cosets generated by g, respectively (see Exercise 3). Whether 
or not these cosets are the same for a particular g € G is a question for Section 6.4. 
However, because the results we want to derive in this section are the same for either left 
or right cosets, we'll look only at right cosets. First, all cosets of a given H < G have the 
same cardinality (Exercise 4). 


Theorem 6.3.3. IfG isa group and H < G, then|H| = |Hg| forallg €G. 


Regardless of whether G is of finite or infinite order, the number of cosets of H in 
G might be finite. If so, we call the number of cosets of H the index of H in G, and 
we denote this number by (G: H). Naturally, if G is finite, then so is (G: H), and the 
following theorem is immediate as an implication of Theorem 6.3.3. 
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Theorem 6.3.4 (Lagrange). IfG isa finite group and H < G, then the order of H divides 
the order of G. 


Proof: Since cosets are disjoint and all have the same cardinality as H, it follows that 
IG|=(G:H)x|A. | 


If G isa finite group and x € G, then Lagrange’s theorem states that the order of the 


subgroup generated by x divides o(G). Since the order of (x) is the same as the order 
of the element x (Theorem 6.2.6), we have o(x) | o(G). Beginning here, it is possible to 
prove several results about elements of a group (see Exercises 5-7). 


EXERCISES 


7. 


. Find all cosets of H = {+1 


. Suppose 0o(G) = n, and g ¢€ G. Show that g” = e. 


. For the construction (Z,, ®,) as discussed on page 221: 


(a) Show that , is well defined on Z, 
(b) Show that @,, is associative 

(c) Determine the identity element 
(d) Determine the inverse of [Kk]. 


. Suppose (G, x, e,') is an abelian group and H < G. Fora,b € G, definea =y b 


ifaxb-!e€ H. 


(a) Show that =y is an equivalence relation on G. 
(b) Show that the equivalence classes generated by =y are cosets of H. That is, for 
any given g € G,[g] = H xg. 
(c) Define xy on G/H by (H * a) *y (H * b) = H x (a « b). Verify that G/H 
with +77 is a group by showing the following: 
G1: Show that «y is well defined on G/H. 
G2: Show that * is closed on G/H. 
G3: Show that *«y is associative on G/H. 
G4: Show that H * e is the identity element of G/H. 
G5: Show that every H * a has an inverse in G/H. 


i} in the quaternion group. For each possible g € Q, is 
gH = Hg? 


. Prove Theorem 6.3.3: If G isa group and H < G, then |H| = |H| forall g € G. 


12 


. Suppose G is cyclic of order n. Show that if k | n, then there exists a subgroup of G of 


order k. 


Show that a group of prime order is cyclic.? 


'2 apply Theorem 6.2.6 and Lagrange’s theorem to (g). 
'3 Any element except e is a generator. 
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6.4 Permutation Groups and Normal Subgroups 


In this section, we take an in-depth lookata particular class of groups, and two particularly 
important subgroups that derive from it. Our main purpose here is to introduce the idea 
of a normal subgroup and to see how it’s just the right thing to patch up the hole we left 
in our derivation of quotient groups by requiring that G be abelian. 


6.4.1 Permutation groups 


Let A be any nonempty set, and let S be the set of all functions f : A 4A. We want 
to take a close look at the group formed on S with the binary operation of composition. 
First, note that composition as a binary operation is well defined on S. For if f; = fr 
and g; = gz, then f\(a) = fo(a) and gi (a) = go(a) for alla € A. Thus 


(fic gia) = filg(@)] = folg2(@)] = (fr © g2)(a) (6.52) 


for alla € A, and we have that f| o g; = fh © go. Since the composition of one-to-one 
onto functions is a one-to-one onto function, composition is closed on S. The fact that 
composition is associative is a mere exercise in the manipulation of parentheses. For if 
we pick anya € A, then 


K(fog)oh|(a) = (fosih(a)]= f(gA(a))) = fe oh)\(@1=[f 0 (goh)I@). 
(6.53) 
Thus [(f og)oh](a) =[fo (g oA)I(@) for alla € A,sothat(fog)oh= fo(goh), 
and o is associative on S. Writing i : A — A, the identity function, then for a given 
f € Sandanyae A, 


(fot)@ = fli@)=f@) and (io fy) =i[f@]l=f@. (6.54) 


Thusio f = foi = f forall f € S andi is the identity element. Furthermore, by 
Theorem 3.2.5, every f € S has its inverse f—! € S. Since 


fif (@l=a=ia) and f'[f@]=a=i(a) (6.55) 


for alla € A, we have f o f~! = f~!o f =i. Thus (S, 0, i,~!) is a group, called the 
permutation group on A, and elements of S are called permutations of A. 


Example 6.4.1. Let A = {1, 2, 3, 4, 5, 6}. For this A, the permutation group is denoted 
So, and is called the symmetric group on six elements. One way to visualize what f € S¢ 
does is to imagine six fixed slots, numbered 1-6, with some sort of object in each position. 
If f(2) = 5, this means that the object in slot 2 is moved to slot 5. Thus every element 
of S¢ does a sort of fruitbasket turnover of the ordered numbers (1, 2, 3, 4, 5, 6). There 
are several ways to describe notationally a particular f € So. One way is to display the 
image of every element of A as follows: 


ihe ae Bee 
f=(3 ae ee s): 8) 


which means f(1) = 3, f(2) = 1, f(3) =2, f(4) =4, f (5) = 6, and f(6) = 5. When 
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we compose two permutations in S¢, we work right to left as usual. For example, if we 


write 
12345 6 
ee 1462 2) ied) 


we may calculate f o g element by element. To find (f © g)(1), we see 1 is first mapped 
to 3 by g, then 3 is mapped to 2 by f. Thus (f o g)(1) = 2. Doing the same for the 
remaining elements yields 


Bags ee Sie fo Beas 

8 Ng a ce FP ON Sd A he BSS 
bdo Se AS AG 
po) (6.58) 


Calculating g o f reveals that (S¢, 0, i,~! ) is not abelian: 


12345 6 
HS (4 oq ea) ee) 
Also, 
4 
12345 6 i OS A558 
eee So eee) oo) 


Another way to describe f from Eq. (6.56) more succinctly is with cycle notation, 
where f = (132)(56) means 1 is mapped to 3, 3 is mapped to 2, and 2 is mapped to 1 for 
one cycle. The other cycle indicates 5 is mapped to 6, and 6 is mapped to 5. The absence 
of 4 means it’s mapped to itself. Since (132) and (56) are disjoint, it doesn’t matter which 
you write first, nor does it matter whether you think of f as a single permutation or as 
the composition of (132) and (56). If you want, you can scroll the numbers in a cycle 
around to make it start with any element in the cycle. Thus (132) = (213) = (321). The 
identity mapping in S,, is generally written (1). If you compose several permutations in 
cycle notation that are not disjoint, you can simplify the expression into disjoint cycles. 


Example 6.4.2. Simplify f = (132)(14)(24)(23)(563). 


Solution: Find f(1) first by tracing 1 through the cycles. Since composition of 
functions is performed from right to left, we see that 1 is first mapped to 4 by (14), 
and 4 is not mapped elsewhere by (132). Thus f(1) = 4. Now find f (4) by noting 
that 4 is mapped to 2 by (24), then 2 is mapped to 1 by (132). Thus f(4) = 1, and 
we have completed one cycle (14). Continuing with 2, we see that 2 is mapped to 3 
by (23), then 3 is mapped to 2 by (132). Thus f(2) = 2, and we can omit it in the 
cycle notation. Continuing with 3, we see f(3) = 5, f(5) = 6, and f(6) = 3. Thus 
the simplified expression is f = (14)(356). | 


How many elements are there in Sg? What about in S, for anyn € N? If you covered 
Section 3.4, you know that |S,,| = n!. If you omitted that section, however, we'll just 
point out that the question is equivalent to asking how many ways are there to arrange 
the elements of N,,. We ask how many possible numbers can go in the first position of 
the bottom row of 6.56, then the second position, and so on. Multiplying these, we see 
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|S,| =n! =n-(n—1)-(n—2)---2- 1. Thus, S, is a group on n! elements, and if 
n > 3, S, is not abelian. 


Example 6.4.3. For f = (132)(56) in S¢, determine (f). 


Solution: By Exercise 3 from Section 6.2 the subgroup generated by f must contain 
all powers of f. 


f? = fo f = (132)(56) 0 (132)(56) = (123), 
f? = fo f = (123) 0 (132)(56) = (56), 
f' = feo f = (56) 0 (132)(56) = (132), (6.61) 
f> = fro f = (132) 0 (132)(56) = (123)(56), 
fo = foo f = (123)(56) 0 (132)(56) = (1). 
Thus 
Grate ae 


= {(1), (132)(56), (123), (56), (132), (123)(56)}. (6.62) 
| 


6.4.2 The alternating group A, 
—EE =D 
Now let’s look at an important subgroup of S,,. A permutation of the form (i/) is called a 
transposition because all it does is switch the positions two elements. Notice (ij)~! = (ij) 
and (ij) = (ji). Ifi = 1 (or j = 1, it doesn’t matter), then (ij) becomes (1/). If neither 
i nor j is 1, then 
(yj) = Udy) di). (6.63) 

Therefore, if yowre playing some game where objects are lined up in positions 1,...,7 
and you want to swap the positions of the objects in positions i and j, it’s possible to 
do it even if you restrict yourself to swapping an object’s position only with the object in 
position 1, only it might take a little longer. However, notice that (ij) and its equivalent 
in Eq. (6.63) both use an odd number of transpositions. 

Even though (ijk) is not a transposition, it can be written as a composition of several 
transpositions. See if you can figure out how to send i to j, j tok, and k to i by several 
transpositions involving only {i, j, k}. A possible answer: 


(ijk) = (ik) (ij). (6.64) 
If none of {i, j, k} is 1, what does Eq. (6.64) become if we require that all transpositions 
involve 1, as in Eq. (6.63)? Taking a hint from Eq. (6.63) and applying it to both (ik) 
and (ij), we can write 
(ijk) = AHNUkK)diD)dD)ds)di). (6.65) 
Since (1i)~! = (12), the two transpositions in the middle of Eq. (6.65) cancel to yield 


(ijk) = (GNU) dy) C4). (6.66) 
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Now (ijk) = (jki) = (kij). Thus, if one of {i, 7, k} is 1, we may assume i = 1, and 
observe that (1jk) = (1k)(1j). However, notice this one fact. Every form of (ijk) that 
we have written here involves an even number of transpositions. 

Suppose we now consider the cycle o = (%1, X2,...,%m), Where no x, = 1. Can 
you take a hint from Eq. (6.66) and jump right to a similar form for o that involves only 
transpositions of the form (1x;)? How about 


Oo = (1x1) (Xin) Xm—1) AX m—2) * + x2) C41)? (6.67) 


If some x, = 1, rotate the terms of o to have x; = 1. Then Eq. (6.67) works by deleting 
the transpositions on each end. Be assured there are many other answers. One thing 
we would like to know is whether the ways of writing o with transpositions all have 
something in common. 

From all this work, we can see that any f € S,, once written in cycle notation with 
disjoint cycles, can be broken down into a composition of transpositions in at least some 
way. The identity can be thought of as requiring zero transpositions, or if you prefer, we 
can write (1) = (12)(12) forn > 2. One characteristic of elements of S, that we want to 
point out, but not prove in this text, is that of all possible decompositions of a particular 
f € S, into transpositions, either they will all involve an even number of transpositions, 
or they will all involve an odd number. This is not a trivial fact to demonstrate. Therefore, 
we'll just accept it here and leave the proof to your later coursework in algebra. Thus the 
n! elements of S, are partitioned into two classes. If f € S, always decomposes into 
an even number of transpositions, then f is called an even permutation. Similarly, if 
f € S, always decomposes into an odd number of transpositions, then f is called an 
odd permutation. 

Now let’s create an important subset of S,,. Let 


A, = {f € S,: f isan even permutation}. (6.68) 


What do you think the relationship between A, and S, is? You'll prove the following in 
Exercise 2. 


Theorem 6.4.4. Letn < N. Then A, < Sp. 


We call A, the alternating group on n elements. 

Given that |S,| = 1!, how many elements do you think A, has? Your first thought 
might be that since every element of S,, is either even or odd, it would only be natural 
that half would be even and half would be odd, so that |A,| = n!/2. If so, youre right. 
In Exercise 4 from Section 6.3, you showed that all cosets of a subgroup have the same 
cardinality. Thus, if you can show that A, has precisely two cosets, that is, itself and 
Sn\An, it will follow that |A,| = n!/2 (Exercise 3). 


6.4.3 The dihedral group D, 


Now let us look at an important subgroup of S4. Consider a rigid square with the numbers 
1, 2, 3, 4 etched on its corners, sitting in the x y-plane, where the positions of each corner 
are also written in the plane (see Fig. 6.1). Consider the following two moves for the square: 


1. a rotation 90° counterclockwise, a move we'll call o (the Greek letter rho); and 
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Figure 6.1 Asquare inthe xy plane for the dihedral 
group. 


2. a flip upside down, sending the top corner to the bottom, and vice versa, and leaving 
the side corners fixed. Call this move ¢ (the Greek letter phi). 


The move p can be expressed as an element of S4; specifically, o = (1 23 4). Visualize 
p and the cycle notation expression of it as sending the corner that initially occupies 
position 1 in the xy plane to position 2, and so on. Similarly, 6 = (24), which sends 
the corner initially in position 2 to position 4, and vice versa. If we consider all possible 
ultimate positions of the square that can result from combining moves p and ¢ in any 
way by composition, we have created a way of visualizing the subgroup of $4 generated 
by {e, o}. This subgroup of $4 is called the dihedral group, and is denoted Ds. 

First, let’s note how many elements there are in Dg. We ask how many possible ultimate 
positions are there for the square that can result from a combination of rotations and 
flips. Perhaps it’s clear that the square can end up either top side up or top side down, 
and in any one of four states of rotation. Thus eight distinguishable ultimate positions 
are possible, and o(Dg) = 8. 

Let’s create a Cayley table for Dg in the following way. Each element of Dg, that is, each 
ultimate position for the square, can be obtained by doing any necessary rotation first, 
and then flipping afterward, if necessary. Thus every element of Dg can be written in the 
form #”p”. As an example, ¢p* would rotate 270° counterclockwise and then flip. This 
maneuver could be written in cycle notation as (24)(1234)?, which upon simplification 
becomes (12) (34). Also, since o(p) = 4 and o(@) = 2, @” p” can always be written with 
0 <m < 1and0 <n <3. Writing elements of Dg in this way, we have 


Dg = {i, Pp, pe foe dQ, gp, bp’, p°}. (6.69) 
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To fill in the Cayley table values, we must combine all elements of Dg and write the 
compositions in a form from Eq. (6.69). This takes a little bit of work, but yields some 
useful principles as you work through it. For example, what is (60) 0(¢) = dog? The first 
maneuver isa flip, followed by a 90° rotation and flip. Which element of Dg from Eq. (6.69) 
is the equivalent of this set of moves? Perhaps you can see that do¢ = (1432) = ee 
Table 6.70 contains a few of the entries for the Cayley table of Dg. In Exercise 4, youll 
finish out this table, and be pointed to a systematic approach that can save you a lot of 
time. Remember! In a Cayley table, the entry down the left column is written on the left, 
and the entry across the top row is written on the right. Since composition is read from 
right to left, it means that the entry from the top row is actually performed first! 


o | ip pp ¢ od 0° $° 

i | i p @ @ oo gp ¢p op? 

p |p p p i op ¢ 

pe |p p ip. ¢p 

pF Op ap? (6.70) 
w) vy) 

gp | $e 

bp? | op” 

bp? | ¢p° 


6.4.4 Normal subgroups 

—s 

Let’s return to quotient groups and recall from Section 6.3 how our assumption that 
G is abelian got us over the hump of showing that the binary operation on G/H from 
Definition 6.3.1 is well defined. For notational simplicity, write 


Hax Hb= H(ab), (6.71) 


using juxtaposition for the binary operation on G and in the coset notation. To show 
that * is well defined on G/H, we would suppose 


Ha, = Hay and Hb, = Hho, (6.72) 
and try to show from this that 
H(a,b,) = H(azbz). (6.73) 


The equations in (6.72) are set equalities, and certainly do not mean that a; = az or b} = 
bp. Instead, they state that ifa, and az generate the same coset of H, and b, and bp generate 
the same coset of H, then a,b; and aby will generate the same coset of H. To show 
Eq. (6.73), we could try to chase an element back and forth between the cosets. Let’s try 
to do that and see two places where we would get stuck in the absence of G being abelian. 

Pick x € H(a,b,). Then x can be written as x = h,a,b, for some h, € H. We must 
write x = kayb for some k € H to have x € H(azb2). We can make partial progress. 
Since hia; € Haj, and since Ha, = Hap, then h,a, € Hap, so that hia; = hoa for 
some hy € H. Thus x = h2a2b,. However, now we're stuck because the only way to 
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replace b; with bp is by exploiting the fact that Hb, = Hb», which means we must have 
some element of H to the immediate left of b). Although replacing h1a; with haa. was 
no problem, we need at least to be able to replace h2az with azh3 for some h3 € H. This is 
a bit like commutativity, but not quite as strong. Then with x = azh3b,, we could exploit 
Hb, = Hb to write x = anhab2 for some hy € H. We would almost be home at this 
point, except that we need to replace ajh4 with hsaz for somehs € H.Ifwe could get past 
these two humps, which are really the same hump met in opposite directions, we would 
have written x = hsayb2 € H (azb2), and be done with C. Showing > would be similar. 
Therefore, let’s make a demand on H that will allow us to get past these humps. The 
question “we addressed” was whether there exists some h3 € H such that ha) = anh3. 
Thus, instead of making the sweeping requirement that G be abelian, we require that H 
have the following feature. 


Definition 6.4.5. Suppose G is a group and H <G. If for all g €G and h € A there 
exists hy € H such thathg = gh, then H is called a normal subgroup of G, and we write 
HAG. 


Definition 6.4.5 only allows you to swap hg for gh. It does not explicitly allow you to 
swap gh with some h,g. However, you can show that it does, and you will in Exercise 5. 

All our work here shows that * from Eq. (6.71) is well defined if H<IG. Furthermore, 
your other work from Section 6.3, Exercise 2 where you did not exploit the abelian nature 
of G completes the proof that G/H is a group, and we arrive at the following: 


Theorem 6.4.6. Suppose G is a group and H is a normal subgroup of G. Then G/H with 
binary operation « defined by (Ha) * (Hb) = H (ab) is a group with identity He and 
inverses (Ha)~'! = Ha™!. 


6.4.5 Equivalences and implications of normality 

—— | 

There are ways other than Definition 6.4.5 to define normal subgroup, and different 
authors approach the idea in different ways. As we'll see in Section 6.5, normality can 
be naturally defined in terms of mappings between groups. However, we’ve chosen our 
definition, so any equivalent forms will have to be demonstrated as theorems. If you read 
the preceding element chasing proof one more time, another way to pinpoint just what 
kind of subgroup H needs to be might jump out at you. Instead of stating our requirement 
as the existence of h3 € H so that haa. = azh3, we could have said there exists h3 € H such 
that h3 = a, "hoa, or simply that a; “hoay € H. That makes the proof of the following 
immediate: 


Theorem 6.4.7. Suppose H < G. Then H is normal if and only if for allh € H andgeéG, 
-1 
g hged. 


It’s one thing to say that a subgroup is closed under the operations of the group. 
However, the fact that g-'hg € H for all h € H and g € G adds a new dimension to the 
closure of H by saying, in a sense, that H cannot be kicked around by things of the form 
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g ‘Og. An expression of the form g~!hg is called a conjugate of h. If H is normal, the 


conjugates of all its elements lie in H. Thus choosing any h € H and g € G, it might be 
that g~'hg is different from h, but at least it’s still in H. This is the way you'll want to 
argue the following in Exercise 6. 


Theorem 6.4.8. Ifn > 1, then A,<1Sy. 


If G is not abelian and H < G is not normal, then a conjugate g~'hg might or might 
not be in H. For example, using p = (1234) € Dg and (12) € Sq, 


(12)~!(1234) (12) = (12)(1234)(12) = (1342) ¢ Dg, (6.74) 


which by Theorem 6.4.7 proves that Dg 4S,. On the other hand, using (123) = 
(13)(12) € Ag and (14) € S4, 


(14)~'(123)(14) = (14)(123)(14) = (234) = (24)(23) € Ag, (6.75) 


which is a consequence of the fact that A4<JS4. If G is abelian, conjugation is not partic- 
ularly interesting. 


Theorem 6.4.9. A group G is abelian if and only if every h € G has no conjugates other 
than itself. 


The proof of Theorem 6.4.9 should be immediately clear, for the conditions that 
gh = hg for all g,h € Gand g_'hg = h for all g, hh € G are identical. 

If we think of g as fixed and allow h to take on all values in H, we create what we call 
a conjugate of the subgroup H. Notationally, we write this as 


g Hg ={g'hg:he H}. (6.76) 


One interesting characteristic of the conjugate of a subgroup is that it is also a 
subgroup of G. You'll prove the following theorem in Exercise 7: 


Theorem 6.4.10. Let G bea group and H < G. Fixg € G. Theng-'Hg <G. 


To see an example of a conjugate subgroup, consider Dg < S4, and let g = (12). For 
some 6 € Dg, the expression g~'dg can be thought of as first switching the numbers in 
positions 1 and 2 on the square (an illegal move in Dg), doing the 6 rotation and/or flip, 
and then switching whatever numbers are in positions 1 and 2 again. For example, 


gpg = (12)(1234)(12) = (1342) ¢ Dg. (6.77) 


Transforming all the elements of Dg in this way creates another subgroup of Sy that 
you might need to play around with to understand. It turns out that g~!Dgg as an 
algebraic structure is just like Dg, except that the rigid square we described before won't 
work as a way to visualize it. Instead, picture the numbers {1, 2, 3, 4} being pushed from 
corner to corner by g~'pg and g~!@g according to Fig. 6.2. The point to be made is that 
g 'Dgg # Dg. Granted, they both contain the identity, but except for that overlap, they 
cut through Sy in different directions. 

If H<G, the situation is slightly different concerning conjugates of H. The proof of 
the following theorem should be quick (Exercise 8). It states H<G if and only if H has 
no conjugates other than itself. 
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Figure 6.2 €ffects of ge lpg and eg log on the square. 


Theorem 6.4.11. Suppose H < G. Then H is normal if and only if g-'Hg = H for all 
geG. 


In showing that x is well defined on G/H, we supposed Ha; = Hay to get us from 
hya, to hoaz. Another feature we might have required of H to get us over the next hump 
from haz to azh3 involves linking the right coset Haz to the left coset aH. That is, if 
we had had that Haz = a2H, our problem would have been solved in precisely the same 
way. This leads to the following equivalence of normality, which you'll prove in Exercise 9. 


Theorem 6.4.12. Suppose H <G. Then H is normal if and only if Hg = gH for all 
geG. 


To sum up, notice how our retreat from the requirement that G be abelian motivated 
the definition of a characteristic of H < G that allows us still to construct the quotient 
group. And notice the following, which you'll prove in Exercise 11. 


Theorem 6.4.13. IfG is abelian, then every H < G is normal. 


Since our definition of normal subgroup was inspired by a retreat from the global 
condition of G being abelian, let’s compare and contrast normality in all its forms to 
similarly worded statements about abelian groups. 


If G is abelian and H <G: If HAG: 
1. Forall g,h € G, gh = hg. For all g € Gandh € H, there 
exists hy € H such that hg = gh. 
2. Forallg,h € G,g-'hg =h. For all g € Gandh € H, g ‘hg € H. 
3. Forallg ¢ G,g 'Hg =H. Forall g ¢ G,g 'Hg =H. 
4. Forallg € G, Hg = gH. For all g € G, Hg = gH. 


As a final observation, consider four sets H C H,; C G C Gy, where H<G instead 
of simply H < G as we considered in Section 6.1. Normality of H as a subgroup of G 
says something about the way elements of H behave in the presence of elements of G, 
and not simply how they behave among themselves. Thus, if H; is also a group, then 
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H<iG will imply H <1H, as well. For if g~'hg € H forall g € G, then certainly the same 
is true for all g € H,. However, if G; is a group, it might be that H AG. Even though 
g ‘hg € H forall g € G, there might be some g € G;\G for which g-'hg ¢ H.A 
trivial example of the truth of this is that Dg<Dg, but Dg A'S. And, although it takes a 
little playing around to see it clearly, Dg<JAq. 


EXERCISES 


1. 


For g = (12) andh = (34) in Sy, determine ({g, }), the subgroup of S4 generated 
by g and h. 


. Prove Theorem 6.4.4: Letn € N. Then A, < Sp. 


. Assuming |S,| = n!, show that |A,| = n!/2 by showing that A, has precisely two 


cosets. !4 


Complete Table (6.70), the Cayley table for Ds.'> 


. Suppose H<IG. Show that for all g € G andh € H, there exists hy € H such that 


gh = hyg.!® 


. Prove Theorem 6.4.8: Ifn > 1, then A,<IS,. 


Prove Theorem 6.4.10:Let G bea group and H < G.Fixg € G.Theng 'Hg < G. 


. Prove Theorem 6.4.11:Suppose H < G.Then H isnormalifandonlyifg-'Hg = H 


for all g € G. 


. Prove Theorem 6.4.12: Suppose H < G. Then H is normal if and only if Hg = gH 


for all g € G. 


. In Sy, let f = (123) and g = (14). Is g(f) = (fg? 
. Prove Theorem 6.4.13: If G is abelian, then every H < G is normal. 


. In Exercise 8 of Section 6.1, you showed that the center of a group G is a subgroup 


of G. Prove that the center of G is normal in G. 


. In Exercise 9 of Section 6.1, you showed that the intersection across a family of 


subgroups of G is itself a subgroup of G. Show that the intersection across a family 
of normal subgroups of G is normal in G. 


=—— 6.5 Group Morphisms 


Now that we have a basic understanding of groups and some of their internal structure, 
let’s turn our attention outward to a special type of function from a group G to a group 


4when do fi, f2 € Sn generate the same coset of A,? See the comments that follow Definition 6.3.1. 
'>First convert all expressions of the form p'¢ to their equivalents in the form ¢p/. Use these results to 
take expressions of the form ¢*o'¢" p” and reorder p'¢" in the middle. Thus all p’s will gravitate to the 
right and all @’s to the left. 

'6 Apply Definition 6.4.5 to h~!g7!. 


6.5 Group Morphisms 


H. The special feature we want these functions to have is that they preserve the binary 
operation. 


Definition 6.5.1. Suppose (G, *, eg, ') and (H, -, ey, !) are groups, and suppose @ : 
G — A isa function with the property that O(x * y) = (x) - d(y) forallx, y € G. 
Then @ is called a morphism from G to H. If # is one-to-one, it is called a monomorphism. 
If @ is onto, it is called an epimorphism. If @ is both one-to-one and onto, it is called an 
isomorphism, and we write G = H, which is read ‘G is isomorphic to H.” Ifo: G > Gis 
an isomorphism, @ is called an automorphism. 


Example 6.5.2. Let Zbe the group of integers under addition, and E the group ofall even 
integers under addition. Then ¢, : Z > E definedby ¢(n) = 2n isanisomorphism, and 
o2(n) : Z —> E defined by ¢2(n) = 4n is amonomorphism that is not an isomorphism. 
The function ¢3 : Z — Z defined by ¢3(n) = —n is an automorphism (Exercise 1). 


Example 6.5.3. Let Z be the group of integers under addition, let n € N, and consider 
the group (Z,, ®n, (n) + 0, —). Then @ : Z > Z,, defined by o(k) = (n) +k is an 
epimorphism that is not an isomorphism (Exercise 2). 


Example 6.5.4. Let G = (R*, x, 1,7!) and H = (R, +, 0, —). Even though we have 
not discussed logarithms in this text, your work in precalculus reveals that @(x) = Inx 
is an isomorphism because In x is a one-to-one function from R* onto R, and satisfies 
b(xy) = In@y) = Inx + Iny = ¢(@) + ¢(). 


Example 6.5.5. Let G and H be any groups, and define ¢ : G > H by $(x) = ey for 
all x € G. Then @ is called the trivial morphism. 


The morphic behavior of ¢: G — H does not explicitly require that particular ele- 
ments of G must map to particular elements of H. However, there are some restrictions 
of this sort inherent in the definition. 


Theorem 6.5.6. If¢ : G > A isa group morphism, then: 
1. b(eg) = ex. 
2. Forallx € G, (x7!) = [o(x)}T!. 
We'll prove part 1 and leave part 2 to you in Exercise 3. 
Proof: 
eu (ec) = bea) = bec * ec) = bea) - (ec). (6.78) 
By cancellation, ¢(eg) = ex. 7 
Notice the strange similarity of Eq. (6.78) to Eq. (2.44), where we showed a-0 = 0 for 


alla € R. With part 1 of Theorem 6.5.6 as the root of an induction argument for n > 0, 
then with part 2 to take care of n <0, you can show the following (Exercise 4). 
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Theorem 6.5.7. Iff : G > H isa group morphism and g € G, then for alln € Z, 
o(g") = [p(g)]". (6.79) 


The statement G = H in Definition 6.5.1 looks like a form of equivalence. In fact, 
you can show the following in Exercise 6. 


Theorem 6.5.8. The relation = in Definition 6.5.1 is an equivalence relation on the set of 
all groups. 


The statement G = H isa statement about the existence of a one-to-one function 
from G onto H with the additional property that it preserves the binary operation. 
Thus most of the work in proving that = has properties El1-E3 has been done in Chap- 
ter 3 and will require only references to applicable theorems. However, in each property 
E1-E3, something will have to be shown about the morphic behavior of the functions 
involved. 

The word isomorphism has a connotation to it that deserves pointing out. For G to be 
isomorphic to H means that G and H are essentially the same group, in the sense that all 
the elements from one can be swapped one for one with those in the other and the internal 
relationships between them as expressed by * are retained by -. By giving a new name 
to every x € G, namely $(x), and swapping the binary operation symbol in G for the 
symbol in H, we have effectively dressed (G, x, €g,!) in the clothing of (H, -, én; *). 
Thus, to be able to map every element of G to a unique and distinct element of H, 
exhausting all elements of H and preserving the binary operation, is to show that, as far 
as their structure as groups is concerned, they are identical. Here are some illustrations 
of this principle that you'll prove in Exercise 7. 


Theorem 6.5.9. Suppose  : G > H isan isomorphism. Then, 
1. If G is abelian, then H is abelian. 
2. IfG is cyclic, then H is cyclic. 

Instead of writing Rng(#), we usually write @(G). Subgroups of the domain and 
codomain are related in the following theorem, which you'll prove in Exercise 9. 
Theorem 6.5.10. Supposed: G — H isa group morphism. Then 

1. @(G) < H. 
2. If NAG, then ¢(N)<1¢(G). 
3. IfN<H, then '(N)AG. 


Notice that part 2 of Theorem 6.5.10 does not say that #(N)<JH. If ¢ is not onto, 
it’s possible that @(NV) is normal in #(G) but not normal in H. 

Theorem 6.5.6 ensures that eg always maps to ey under a group morphism. It might 
be that other elements of G also map to the identity in H. In Example 6.5.3, every integer 
of the form kn where k € Z maps to (n) + 0. In Example 6.5.5, every element of G 
maps to ey. We give a name to the set of all elements in G that map to ey under a 
morphism. 


6.5 Group Morphisms 


Definition 6.5.11. Suppose @: G—> H is a group morphism. Then @ '(ey) = {x € 
G : (x) = ey} is called the kernel of @ and is denoted Ker(@). If Ker(@) = {eg}, we say 
that the kernel of @ is trivial. 


One important feature of Ker(¢) is the following, which you'll prove in Exercise 10. 


Theorem 6.5.12. Iff : G— H isa group morphism, then Ker(o) dG. 


An interesting property of group morphisms is that you can sometimes learn a lot 
about the behavior of ¢ across all of G by looking at its behavior at certain places in G. 
Clearly, if @ isa monomorphism, then Ker(¢) is trivial. Interestingly, the converse of this 
is also true (Exercise 11). 


Theorem 6.5.13. Suppose @ : G— H is a group morphism. Then @ is one-to-one if and 
only if Ker(@) = {ec}. 


Thus if #~!(e;;) has only one element, then #~!(y) has only one element for every 
y € @(G). We can go even further and show that 


|p '(y)| = |Ker()| (6.80) 


for all y€@(G), so that all pre-image sets of individual elements in ¢(G) have the 
same cardinality. Probably the easiest way to do this is to exploit the fact that Ker(¢)<JG 
in order to prove something even stronger than Eq. 6.80. Theorem 6.5.14 says that 
the pre-image of each y € ¢(G) is simply one of the cosets of Ker(@). Since all cosets 
of a subgroup have the same cardinality (Theorem 6.3.3), Eq. (6.80) follows. In the 
next theorem, we use left cosets to keep the notation uncluttered. You'll prove it in 
Exercise 12. 


Theorem 6.5.14. If@ : G— A isa group morphism, then for all y € $(G), there exists 
aéG such that @~'(y) = a Ker(@). 


If you've caught on to what’s happening here, you might have observed that any 
group morphism ¢ : G— H makes some interesting statements about the internal 
structure of G. The morphism ¢ gives rise to a normal subgroup of G, namely Ker(@). 
From there all of our theory from Section 6.4 applies to present us with a quotient group 
G/Ker(@), with its binary operation a Ker(#) *b Ker(@) = (ab) Ker(@), identity Ker(@) 
and inverses [a Ker(#)]~'! = a~! Ker(#). Thus any time youre given a group G and can 
manage to find a morphism into some other group H, you've managed to find a normal 
subgroup of G and motivate a quotient group from it. 

Also, if youre like a lot of people at your stage of the mathematical game, you wish 
there were a better way to see what’s going on inside this new quotient group G/Ker(@) 
than by visualizing cosets whacking each other around. Well, there is another way to 
visualize G/Ker(@), for it is essentially the same as (isomorphic to) #(G). We'll get to a 
full-blown statement and proof of that soon. 

A group morphism ¢ : G — H gives rise to a normal subgroup of G and a quotient 
group from it. Now let’s go the other way: Given a group G and any N<1G, we can createa 
group H andan epimorphism ¢ : G > H such that Ker(@) = N. It doesn’t involve much 
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we haven’t seen before, mostly just some observations. You'll provide the only missing 
detail in Exercise 13. 


Theorem 6.5.15. If (G, *,e,~!) is a group and NAG, then @ : G > G/N defined by 
(x) = Nx is an epimorphism whose kernel is N. 


Proof: From our previous work, ¢ is defined on all of G and is well defined. Also, ¢ 
is onto because every Nx € G/N is generated by x € G, and d(x) = Nx. We must 
show that @ behaves morphically and satisfies Ker(@) = N. Pick x, y € G. Since the 
binary operation on G/N is well defined, 


o(xy) = N * xy) = Nx * Ny = 6(x)G(). (6.81) 
Finally, from Exercise 13, Ker(@) = N. a 


So we can have it both ways. Two groups and a morphism give rise to a normal 
subgroup of the domain. And a group with a normal subgroup gives rise to another 
group and a morphism onto it. Here’s the final tie. Whichever you start with and use to 
create the other, the range of the morphism and the quotient group of G are isomorphic. 
That is, @(G) and G/Ker(@) are essentially the same group, as the following theorem 
states. For notational simplicity, we'll assume that a given ¢ : G — H is onto so that we 
don’t have to distinguish between H and $(G). 


Theorem 6.5.16. Supposed : G > H is a group epimorphism. Then 
G/Ker(¢) = H. (6.82) 


We'll supply only a skeleton of the proof here, leaving most of the details to you in 
Exercise 14. Before we begin the proof, look at Fig. 6.3, which illustrates all the groups 
and mappings involved. First, there are the given groups G and H with the epimorphism 
o:G— H. Now create a carbon copy of G, but collect all the elements of Ker(¢) and 
hogtie them together into a single entity in the sketch of G/Ker(@). Similarly, go to 
each coset of Ker(@), take all of its elements, and lump them together into a single 
entity in G/Ker(@) to create a visualization of G/Ker(@). We have a link between G and 
G/Ker(@), and that is the mapping that sends x € G to x Ker(@). Now it might be that @ 
is one-to-one, or maybe not. But the extent to which ¢ collapses elements of G down 
to single elements of H is precisely the extent to which elements of G clump together 
into cosets in G/Ker(#) (Theorem 6.5.14), which itself depends upon the size of the 
kernel. The task is to show that G/Ker(@) and H are isomorphic by finding the required 
mapping between them. We’ll imagine mapping an element of G/Ker(@) to an element 
of H, and the way we decide where to map a chosen coset x Ker(@) € G/Ker(@) is by 
grabbing some element of x Ker(@), say x, and sending the whole coset to ¢(x) in H. In 
the proof here, we'll use - to represent the binary operation in H. 


Proof: We must find a one-to-one, onto function w : G/Ker(@) — H such that 
Wlx Ker(d) * y Ker(¢)] = w[x Ker(@)] - wly Ker(#)] (6.83) 
for all x Ker(@), y Ker(@) € G/Ker(@). Define y : G/Ker(¢) — H by 
wlx Ker(p)] = 6). (6.84) 


6.5 Group Morphisms 


Ker 


‘ 
by 
x maps to Korg 
x Kero e 


Figure 6.3 |somorphism between G/Ker(@) and H. 


By Exercise 14, y is well defined on all of G/Ker(@), is one-to-one and onto, and 
satisfies Eq. (6.83). r | 


As a final note, the theorems from this section provide us with another logical equiv- 
alence to normality of a subgroup. If N<G, then there exists a group H and an epimor- 
phism ¢ : G > H such that Ker(¢) = N. Also, if ¢ : G > A is an epimorphism, 
Ker(¢)<G. Thus, given a group G and N < G, N is normal in G if and only if there 
exists some group H and some morphism @ : G > H such that N = Ker(@). 


EXERCISES 


1. Verify the claims from Example 6.5.2 concerning ¢, ¢2, and ¢3. 
2. Show that ¢@ from Example 6.5.3 is an epimorphism that is not an isomorphism. 


3. Prove part 2 of Theorem 6.5.6: If @ : G— H is a group morphism, then for all 
x€ G(x!) =[o@)J. 

4. Prove Theorem 6.5.7: If @ : G— H is a group morphism and g ¢€ G, then for all 
n& Z, o(g") = [(g)]’. 


5. Restate Theorems 6.5.6 and 6.5.7 in their additive form. 
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10. 
ll. 


12. 


13. 
14. 


15. 


16. 


17. 
18. 


. Prove Theorem 6.5.8: The relation = in Definition 6.5.1 is an equivalence relation 


on the set of all groups. 


. Prove Theorem 6.5.9: Suppose @ : G — AH is an isomorphism. Then, 


(a) If G is abelian, then H is abelian. 
(b) If G is cyclic, then H is cyclic. 


. Let Gbea group, and defined : G > Gby¢(g) = g~!. Assuming ¢ is a one-to-one 


function from G onto itself, show that ¢ is an automorphism of G if and only if G 
is abelian. 


. Prove Theorem 6.5.10: Suppose ¢ : G > H is a group morphism. Then 


(a) (G) < H. 

(b) If NAG, then @(N)<g(G). 

(c) If NH, then @!(N)AIG. 

Prove Theorem 6.5.12: If ¢ : G > H isa group morphism, then Ker(@)<IG. 


Prove Theorem 6.5.13: Suppose ¢ : G— H is a group morphism. Then ¢ is one- 
to-one if and only if Ker(@) = {ec}. 


Prove Theorem 6.5.14: If@ : G > H isa group morphism, then for all y € ¢(G), 
there exists a € G such that ~!(y) = a Ker(@). 


Finish the proof of Theorem 6.5.15 by showing that Ker(¢) = N. 


Finish the proof of Theorem 6.5.16 by showing that wy : G/Ker(@) — H defined by 
Eq. (6.84) satisfies the following: 

(a) w is defined on all of G/Ker(@). 

(b) y is well defined. 

(c) w is one-to-one. 

(d) wy is onto. 

(e) For all x Ker(@), y Ker(@) € G/Ker(@), 


Wlx Ker() * y Ker(p)] = w[x Ker(@)] - wly Ker(@)]. 


Construct notation and Cayley tables to determine (up to isomorphism) all groups 
on five or fewer elements. 


Find isomorphisms between the four-element groups you found in Exercise 15 and 
the following. 


(a) The multiplicative group S = {+1 +7}. 


(b) The subgroup of S4 in Exercise 1 from Section 6.4. 
Describe all cyclic groups. 


Show that Dg 4 Q, where Q is the quaternion group from Section 6.1. 


Rings 


a set with two binary operations and laying down some assumptions about how 

these operations behave, both on their own and in relation to each other. In this 
chapter, we'll look at several such structures. Before we do, some explanation is in order 
about how we're going to proceed, for the theory of rings is full of all kinds of details that 
make a road map very helpful. 

First, in Section 7.1, we'll define the most general algebraic structure with two bi- 
nary operations, a ring, construct several important examples, and define subring. In 
Section 7.2, we'll look at several properties that the most general rings share. At the same 
time, we'll make a passing reference to fields, the most specialized kind of ring we'll study. 
One particularly important class of rings can be created by what we call adjoining an 
element to a given ring. We devote Section 7.3 to this class of examples. In Section 7.4, 
we dive down inside a ring to look at specialized substructures of a general ring. Ideals, 
principal ideals, prime ideals, and maximal ideals are special types of substructures we'll 
see there. In Sections 7.5—7.7, we'll study four increasingly specialized kinds of rings: 
integral domains; unique factorization domains; principal ideal domains; and Euclidean 
domains. Each class of these structures is a proper subset of the class that comes before 
it, so as we progress, we'll demonstrate (or at least refer to) examples that illustrate this. 
For example, we’ll see a ring that’s not an integral domain, an integral domain that’s 
not a unique factorization domain, etc. In Section 7.8 we'll look at ring morphisms, and 
finally, in Section 7.9, we'll build quotient rings. 


W e can create algebraic structures of greater complexity than a group by endowing 


7.1 Rings and Subrings 


CHAPTER 


7 


7.1.1 Rings defined 


The simplest structure with two binary operations, and therefore where we begin, is called 
a ring. Because the assumptions we make about ring operations so closely resemble 
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those for addition and multiplication on Z, it’s common to use the notations + and 
either - or juxtaposition for the two operations, and shamelessly call them addition and 
multiplication, respectively, even though by doing so we run some slight risks. One is that 
we might inadvertently think that some of the rings we create are more like the integers 
than the assumptions justify, because the WOP of N makes Z a very special kind of ring. 
Thus we have to be careful not to bring any excess baggage from our understanding of 
the integers that the general ring assumptions do not imply. On the other hand, being 
able to envision Z as a sort of quintessential example of a ring means that some of the 
results we proved for the integers will translate directly over to any ring and be clear to us 
right away. Thus, some of the theorems we’ll state in this section will require very little 
in the way of a new proof, but mimic those for Z with very little or no variation. The 
second risk we run in using notation already associated with the integers is that it might 
stifle our imagination when we try to create new and interesting rings. There are quite 
a number of very interesting rings that creative minds have concocted from interesting 
sets and definitions of equality, addition, and multiplication. We’ll see several. Here is 
the definition of ring, along with an enumeration of its defining characteristics: 


Definition 7.1.1. Suppose R is a nonempty set endowed with binary operations + (addi- 
tion) and - (multiplication), such that R is an abelian group under +, with additive identity 
O and additive inverse operation —, and - is an associative binary operation that distributes 
over + from the left and right; that is, a(b-+c) = ab+acand(b+c)a = ba+ca forall 
a,b,c € R.Then the algebraic structure (R, +, 0, —, -) is called a ring. If multiplication is 
a commutative operation, then R is called a commutative ring. If there exists a nonzero iden- 
tity element for multiplication, we denote such an element e, and it is called a unity element 
or simply aunity. 


Before we make some comments about Definition 7.1.1, let’s spell out the essential 
features of a ring in glorious detail: 


(R1) Addition is well defined on R (G1). 
(R2) Addition is closed on R (G2). 
(R3) Addition is associative (G3). 
(R4) There exists 0 € R such that a + 0 = a for alla € R (G4). 
(R5) For alla € R, there exists —a € R such that a + (—a) = 0 (G5). 
(R6) Addition is commutative (abelian). 
(R7) Multiplication is well defined on R (G1). 
(R8) Multiplication is closed on R (G2). 
(R9) Multiplication is associative (G3). 
(R10) Foralla,b,c € R,a(b+c) =ab+acand(b+c)a = ba+ca. 


Definition 7.1.1 requires that, if a ring R has a unity element e, it must be nonzero. 
The only reason for this is that e = 0 causes R to collapse to {0}, just as it does in R 


7.1 Rings and Subrings 


if 1 = O. Since the trivial ring {0} is not particularly exciting in its internal structure, 
and since an occasional general result about rings with unity would not apply to {0}, we 
simply insist for convenience that e £ 0. 

Definition 7.1.1 does not mention the existence of multiplicative inverses. Certainly 
ifa ring has a unity, some elements other than e itself might have a multiplicative inverse. 
We'll wait until Section 7.2 before we address their existence. 


7.1.2 Examples of rings 

a 

We can only hope that many our examples of groups and their binary operations can 
serve as the raw material from which to construct rings, or at least that new settings 
we'll create will involve binary operations whose basic features are transparent enough 
to make our path through the verification of properties RI-R10 quick and relatively 
painless. Here are a few examples of rings. We'll return to all of them several times in 
later sections. 


Example 7.1.2. The integers with addition and multiplication form a commutative ring 
with unity element 1. Similarly, Q, R, and C under the same operations are commutative 
rings with unity. The even integers are a commutative ring without unity. 


Example 7.1.3. Forn > 2, consider Z,, with addition ©, defined as in Eq. (6.42). Define 
multiplication ®, in an analogous way: 


[(1) + a] ®, [(n) +b] = (n) + ab. (7.1) 


Table 6.2 illustrates the behavior of ®, in Z. From all our work in Section 6.3, prop- 
erties R1—R6 are satisfied. Properties R7-R10 must be shown (Exercise 1). Addition- 
ally, multiplication is commutative, and there exists a multiplicative identity. Thus, 
(Zn, Bn, (n) +0, —, @n) is a commutative ring with unity (n) + 1. 


For several reasons it would be a shame not to introduce an example of a matrix ring 
at this point. First, as a mathematical structure, matrices are very important in theoretical 
and applied mathematics. They are the bread and butter of linear algebra and of many 
highly computational processes that have only become practical since computers have 
been with us. For our purposes, they are a storehouse of examples that exhibit all kinds 
of interesting behaviors. 


Example 7.1.4. A matrix is a two-dimensional array, typically of real numbers: 


ay a2 +7: Ain 
a21 42,2 **: arn 

A=| . on ; (7.2) 
Am, Am,2 *** Amn 


The matrix A in Eq. 7.2 is said to have dimensions m by n, and the set of all m x n 
matrices with real number entries is denoted R,,,,.,. If A has dimensions m x n, we 
sometimes write it as Am» if we need to display its dimensions. Notice how entries of 
A are tagged with a row and a column number, and in that order. The commas in the 
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subscripts are annoying to write and probably not necessary, so we'll omit them if we can 
get away with it and still be clear. Thus a4. means the entry down in row 4 and across in 
column 2. Most of the matrices we’ll use will be 2 x 2 because we don’t need to be more 
general than that to construct some interesting examples. 

Before we even discuss binary operations on sets of matrices, we need first to define 
what it means for two matrices to be equal. We'll say two matrices Aj, yn, and Bn. xn, 
are equal if two conditions are satisfied. First, the dimensions of A must be the same as 
those of B; that is, m; = mz and n, = no. Second, all their corresponding entries must 
be equal: aj, = bj, for alll < j < my =m and 1 <k <n = np. This definition is 
clearly an equivalence relation because properties E1—E3 are satisfied for the dimensions 
and all the real number entries in the matrices involved reveal that our definition of 
matrix equality also satisfies properties El1-E3. 

Addition of matrices requires that they have the same dimensions, and A + B is 
merely the matrix whose entries are the sums of the corresponding entries from A and 
B. For the 2 x 2 case, writing A = [“"' 4°] and B = [>!" 3], we have 


a2, an bo bx 


(7.3) 


+b +b 
cae & u ap a 


ay, + by, a2 + bay 


Matrix multiplication is more complicated than addition. In general, in order for 
the product AB to be defined, A and B do not have to be the same dimensions, but there 
is some relationship between their dimensions that must be satisfied. Since two square 
matrices (n x n) can always be multiplied to produce another n x n matrix, we'll define 
multiplication only for this case. The 2 x 2 case should get the point across, but we'll 
explain it in more general language. Writing 


a b Crop 
a=|é ‘| and a=" at (7.4) 


we define AB in the following way. To calculate entry (j,k) (row j, column k) in the 
product, mentally highlight row j of A and column k of B. Mentally run your fingers 
across row j of A and down column k of B, multiplying the pairs aj, b,x, aj2b2,, etc., 
and add up these products. This sum of product pairs is entry (j, k) in AB. Thus, for A 
and B in Eq. (7.4), 


b bh 
Pes eae a (7.5) 
ce+dg cf+dh 


If you create two 2 x 2 matrices pretty much at random, you'll probably see that matrix 
multiplication defined this way is not commutative. With these definitions of matrix 
addition and multiplication, let’s consider 


RAs He ‘| -a,b,c,d eR} (7.6) 
c d 


and show that it’s a noncommutative ring with a unity element. That matrix addition 
is well defined (R1) is transparent and notationally tedious. If A= B and C= D, then 
showing A + C = B+ D amounts to nothing more than applying the fact that addition 
is well defined on R to each entry in A + C and B + D. Similar drudgery reveals that 
matrix multiplication is well defined, so property R7 holds. Closure of addition and 
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multiplication in R means that all entries in A + B and AB are real numbers, so that 
properties R2 and R8 hold. Associativity and commutativity of addition in R means 
properties R3 and R6 hold. The matrix denoted 0, with all zero entries, is the additive 


identity, and 
a b —a —b 
< k ‘| - & a . ah) 


The only remaining properties are R9 and R10, associativity of multiplication, and left 
and right distributivity. These are not immediately obvious, but verifying them with all 
the necessary notation is about as exciting as counting stripes on the highway, and youre 
about as likely to make a mistake. 

Thus Rox is a noncommutative ring. Does it have a unity element? Why yes it does. 
Writing 


1 0 0 0) 
0 1 0 0 

Inxn = 00 1+. 0 ’ (7.8) 
000 .:--- I 


the matrix with ones down what we call the main diagonal and zeroes elsewhere, you can 
see that Inx2A = Alny2 = A for all A € Rox2. Finally, by exactly the same reasoning 
already used here, R,,x, is a noncommutative ring with unity element J,.7, as are Qnxn 
and Zaxn- 


Here’s one more example of a ring that we can create from two given rings R and S. 
Verifying that this creation is a ring is both tedious and transparent, but you should at 
least mentally walk through the steps (or at least the first few of them) to see that it satisfies 
properties RI-R10. Rather than going crazy with notation to distinguish operations and 
elements of R from those of S, writing expressions like Og and +5, we'll assume your 
acquired level of mathematical sophistication makes them unnecessary. Just make sure 
you notice which set all the operations are being performed in. 


Example 7.1.5. Suppose R and S are rings, and consider 
RxS={,s):reR,s€S}, (7.9) 


the Cartesian product of R and S. Define (71, 5,) = (72,52) in R x S ifr, = rg and 
S1 = 52. Define addition and multiplication in R x S by 


(ry, 51) + (v2, 52) = Ti + 12,5) +82) and (71,51) + (72, $2) = (TiP2, $182). (7.10) 
Then R x S is a ring under the operations defined in Eqs. (7.10). The zero element of 


R x Sis (0,0), and —(r, s) = (—r, —s). If R and S each have a unity element, then so 
does R x S, and eRxs = (er, es). 
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7.1.3 Subrings 

—z 

If Risaringand S C R isalsoa ring under the same operations, we say that S is a subring 
of R. Demonstrating S C R is a subring, some of properties RI-R10 are inherited from 
R, while some must be shown for S. Take a look at properties RI-R10 again and note for 
yourself which ones must be demonstrated for S. Here they are: 


(S1) S is closed under addition (R2, H1). 

(S2) S contains the additive identity (R4, H2). 
(S3) S is closed under additive inverses (R5, H3). 
(S4) Sis closed under multiplication (R8, H1). 


Thus a subring S of a ring R is merely a subgroup of the additive group that is also 
closed under multiplication. We call {0} the trivial subring, and all subrings other than 
{0} and R itself are called proper subrings. If R has a unity element e, it is not necessary 
that e € S in order for S to be a subring of R. 


Example 7.1.6. The set of even integers is a subring of Z. It is also a subring of Q and 
R. The integers are a subring of Q and of R. The rationals are a subring of R. 


Example 7.1.7. Z,,.. is a subring of Roy. 


Example 7.1.8. Call a square matrix diagonal if its only nonzero entries lie on the main 
diagonal. Let D2. be the subset of Z2,.2 consisting of the diagonal matrices. Then D2x2 
is a subring of Z,.2 (Exercise 2). 


Example 7.1.9. In this example, we create a subring of Q that we'll return to in Sec- 
tion 7.6. Let Qgp be the subset of Q whose denominators are odd. There is more than 
one way to denote elements of Qgp. The form 


m 


is an obvious way, but another useful way to denote the set is to exploit the prime 
factorization of the numerator and denominator, isolating 2 to keep it separate from all 
the other primes involved. This works for all elements except zero, which we'll throw in 
separately. 


Qn figs 
t ie :n € W and pj, q; are odd primes 
q192°** Is (7.12) 
foralll <i <rand1 <i <s}. 


Qop = {0} U {3 


How much repetition there is among the p; and g; doesn’t matter. And notice that the 
form of an element in Eq. (7.12) includes 1 by letting n = 0,r = s = 1, and p, = q. 
Verifying that Qgp has properties S1-S4 is quick. Using either form (7.11) or (7.12), we 
see that both addition and multiplication are closed because the product of denominators 
of two elements is odd. That Qg p contains 0 and additive inverses is obvious. Thus Qgp 
is a subring of Q. 


7.2 Ring Properties and Fields 


If youre required to show that a particular S C R is a subring of R, you might save 
yourself some time if you exploit the fact that S1-S3 are merely the subgroup properties 
H1-H3 applied to R as an abelian additive group. If you already know that (S, +, 0, —) 
is a subgroup of (R, +, 0, —), then a lot of your work in showing S is a subring of R is 
already done. Keep that in mind when you prove the following in Exercise 4. 


Theorem 7.1.10. Suppose F is a family of subrings of a ring R. Then NsesS is a subring 
of R. 


Although subrings are interesting in their own right, they are bland when compared 
to the special kind of subring called an ideal. Ideals come in all kinds of interesting flavors, 
and we'll taste several in Section 7.4. 


EXERCISES 


1. For Z, in Example 7.1.3, show the following: 


(a) Addition and multiplication satisfy properties R7—-R10. 
(b) Multiplication is commutative. 
(c) There exists a multiplicative identity. 


2. Show that the set D2x2 in Example 7.1.8 is a subring of Zo x2. 


3. Show that the set of all rational numbers with even numerator and odd denominator 
is a subring of Qop. 


4. Prove Theorem 7.1.10: Suppose F is a family of subrings of a ring R. Then Nes! isa 
subring of R. 


5. Define er to be a right unity fora ring R ifr -er =r forallr € R. Similarly, define 
e, to bea left unity ife;, -r =r forallr € R. Let 


2h), pfiaber|. (7.13) 


(a) Show L is a ring by showing it is a subring of Ro 2. 
(b) Find an element of L that is a right unity but not a left unity. 
(c) Show that this right unity is not unique. 


7.2 Ring Properties and Fields 


7.2.1 Ring properties 


Because a ring is an abelian group under its addition operation, all the properties of abelian 
groups that you proved in Chapter 6 apply to addition. With regard to multiplication and 
its interaction with addition, it would probably be a good idea to swing back by Section 2.3 
and point out those theorems we proved for R that used only its ring properties. Many 
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theorems will then translate very similarly over to a general ring. The only difference is that 
multiplication might not be commutative, so we have to state and prove certain theorems 
in two-sided language to get the full strength. We’ll state them here, with appropriate 
comments along the way. You'll prove some of them in Exercises 1 and 2. The corollaries 
should be mere observations. 


Theorem 7.2.1. If R isa ring, thna-0=0-a=Oforallae R. 


Theorem 7.2.1 implies that a ring with unity will not contain 07. 


Theorem 7.2.2. If R is aring anda, b € R, then 
1. (—a)b = —(ab), 
2. a(—b) = —(ab), 
3. (—a)(—b) = ab. 


Corollary 7.2.3. If R is a ring with unity e, then (—e)a = a(—e) = —a foralla € R. 


If R has a unity element, the next theorem follows immediately from Corollary 7.2.3 
in the same way that Corollary 2.3.8 follows from Corollary 2.3.7 for R. However, if R 
has no unity element, such a proof won't work. Not to worry. Since a ring is an abelian 
group with respect to addition, Eq. (6.18) applies, which says for then = —1 case that in 
an abelian group (G, *, e,!), (a*b)~! = a! * b7!. Since the operation we’re working 
with in R is addition, that translates to the following additive form. 


Theorem 7.2.4. If R is a ring, then —(a + b) = (—a) + (—D) foralla,be R. 
The distributive property extends nicely in a general ring to yield the following result 


analogous to Exercises 3 and 4 from Section 2.4. 


Theorem 7.2.5. If R is a ring anda, b,, bo,...,b, € R, then 


i ae een): (7.14) 


Theorem 7.2.6. If R is aring and ay, az, ...,4m,b1, b2,..., bn € R, then 


(doje1 4j) (Deka De) = Da ear aye) (7.15) 


Just as in a group, if a ring has a unity element, it can have only one. Your proof 
from Exercise 4 in Section 6.1 will probably translate directly over to a general ring pretty 
much word for word. 


Theorem 7.2.7. If R is a ring with unity, then the unity element is unique. 


Even though elements of a ring with unity are not assumed to have multiplicative 
inverses, it might be that some of them do. If for x € R there exists y € R such that 
xy = yx =e, then x is called a unit of R. Notice that Theorem 7.2.1 implies that zero 
is not a unit. In Exercise 3 you'll find the units of certain rings. 


7.2 Ring Properties and Fields 


Divisibility in a ring is defined in pretty much the same way it is for Z, except that 
we must distinguish between left and right divisors. 


Definition 7.2.8. Let Rbearing,a,b € R, anda $ 0. Thena is called a left divisor of 
b if there exists nonzerok € R such that ak = D. Similarly, a is called a right divisor of b 
if there exists nonzero k € Rsuch that ka = b. If R is a commutative ring and there exists 
nonzerok € Rsuch that ak = b, we say simply that a is a divisor of b, or that a divides b, 
and we write a | D. 


Ifa divides b, it does not necessarily mean that thek € R such thatak = borka = b 
is unique. In Exercise 4, you'll show that this is true. In a more specialized ring we'll study 
in Section 7.5, however, we will have uniqueness. 

Having defined divisors, we can now define what it means for an element of a ring 
to be prime, although we won’t really look into any of its properties until Section 7.5. To 
motivate the term by returning to N, a prime p has exactly two distinct natural number 
divisors, 1 and p. By definition, 1 is not called prime. Thus if p € N is prime and p = ab 
is any factorization of p where a,b € N, then either a = 1 or b = 1. In the ring of 
integers, primes are extended to include the negatives of the primes in N. Thus in Z, for 
p to be prime means that if p = ab is any factorization of p where a, b € Z, then either 
a or b is +1. If you've already done Exercise 3a, you showed that +1 are the units in Z. 
In a general ring, a prime element is defined by using the language of divisors and units. 


Definition 7.2.9. Suppose R isaring, and p € R is not a unit. Then p is said to be prime 
if every factorization p = abwitha, b € R implies either a or bis a unit in R. 


Definition 7.2.9 automatically excludes zero from being prime, for 0-0 = 0, and 0 is not 
a unit. 

In Exercises 2 and 6 from Section 2.3, you proved multiplicative cancellation for 
nonzero real numbers and the principle of zero products. In a ring, these properties do 
not necessarily apply, either as part of the definition or as logical consequences of it. These 
properties will reappear in Section 7.5, though, when we discuss integral domains. From 
Definition 7.2.8, if ab = 0 while neither a nor b is zero, then a and b are called divisors 
of zero or zero divisors. In Exercise 5, you'll find some zero divisors in certain rings, and 
in Exercise 7, you'll prove the following: 


Theorem 7.2.10. Ifa is a divisor of zero in a ring with unity, then a is not a unit. 

With Theorem 7.2.10, you can prove the following (Exercise 8): 

Theorem 7.2.11. If R is a ring with unity andx € R is a unit, then the y € R satisfying 
xy = yx =e is unique. 


Many of the results in Section 2.3 involved ordering of real numbers as measured 
by <. Since a ring doesn’t necessarily have any such way of comparing its elements, 
none of these results have meaning in a ring without such a basis for comparison being 
defined. Thus, rings do not necessarily have positive and negative elements, there is not 
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necessarily a way to measure the size of elements as absolute value does in R, and there’s 
not necessarily a way to make the WOP applicable to subsets. Just because the equation 
x? = —1 has no solution in R does not mean that the equation x? = —e cannot have a 
solution. 

Since a ring with its addition operation is an abelian group, we can apply the additive 
forms of the recursive definitions in Eqs. (6.13-6.15) as you wrote them in Exercise 2 


from Section 6.2: 


Oa = 0, (7.16) 
(n+ l)a=na+a, (7.17) 
(—n)a = —(na). (7.18) 


As we stated in Section 6.2, it’s important to keep in mind what is an integer and what is 
a ring element in these equations. 


Example 7.2.12. In Zs,.2,let A=|_| 5]. Then 


1 2 0 0 
anoft ]=[ 9. ae 
6 times 
—__- mu 
1 2 1 2 6 12 
6a =| oft [i sete AL (7.20) 
-6A=-64)=-| § ole le eh. (7.21) 


Writing Eqs. (6.16-6.18) in their additive forms as you also did in Section 6.2, we 
have foralla,be€ Randm,neZ 


ma+na=(m+n)a, (7.22) 
m(na) = (mn)a, (7.23) 
n(a+b)=na+nb. (7.24) 


Again, be careful to distinguish Eqs. (7.22—7.24) from the associative and distributive 
properties that characterize R and Z individually. 


Example 7.2.13. For A= |_| | and B = |; HE 
1 2 1 2 4 8 2 4 6 12 
sat2anal of +2 {1 se oe ot [2 Allee 0| 


=r E | a cee (7.25) 


2 4 8 16 ee: 
4(2A) = 4 E | = E | =8 E 0| = 8A, (7.26) 


7.2 Ring Properties and Fields 


1 2 5 0 6 2 24 8 
Se aa) 9 alee 
4 8 20 0 1 2 5 0 
= & 0| +| 8 4 =a | +4) 4 =4A+4B. (7.27) 
Even though the multiplication operation in a ring is not necessarily accompanied 
by an identity and inverses for elements, we can use the multiplicative forms of the 
definitions of a” in a limited way. For any a € R, we begin by defining a! = a and 
a"! = q" .a forn > 1. If R has a unity element e, we also define a° = e, but only for 
a # 0. Ifa isa unit of R, we can define a~” = (a~')" forn EN. 
Example 7.2.14. In Zio from Example 7.1.3, 
Say. Sony Bea Oy aS ee. zie. | 17.28) 
Dale SS ae. On. Or 2 SIONS = oy: Ste, 91729) 
Since 3 is a unit in Zo, and 3~! = 7, we have 


317=7, 37 = 61% H=7PH=9, 37° =7 =3, ete. (7.30) 


Example 7.2.15. In Zs,.2,let A=|_| 5]. Then 


-1 -2 
A= atx a=[—4 oh etc 


With these definitions of a” for appropriate n € Z, and by arguments exactly like 
those in Exercise 6 from Section 2.4, we have the following: 
Theorem 7.2.16. Suppose R is a ring,a,b € R, andm,n € N. Then 
q” . a" = qntn (7.31) 
(a™)* =a, (7.32) 


If R has a unity element anda # 0, Eqs. 7.31 and 7.32 hold form,néW. Ifa is a unit, 
Eqs. (7.31) and (7.32) hold for allm,n € Z. Furthermore, if R is commutative, 


(ab)” = a"b” (7.33) 
for alln € Z for which a” and b” are defined. 


One big difference in the way we mentally visualize Z and Z, is that Z extends 
out indefinitely along the numberline in both directions, whereas Z,, is circular. If we 
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generate both Z and Z,, by considering 1, 1 + 1, 1 + 1 + 1, etc., no expression of the 
form nl = )~;_, 1 ever produces a sum of zero in Z, but nl = }°7_, 1 = 0 in Z,. In 
a general ring with unity element e, whether or not some expression ne = )“_, e = 0 
ever occurs motivates a term. 


Definition 7.2.17. Suppose R is a ring with unity, and suppose there exists m € N such 
that ne = 0. Then the smallest such 7 for which this holds is called the characteristic of R, 
and is denoted char R. If no suchn &€ N exists, then R is said to have characteristic zero. 


In Z, the fact that n =, 0, or nl = 0 is zero in Z, means that char Z, <n. On 
the other hand, if meN and m =, 0, that is, if m1 is zero in Z,, then m = kn for 
some k EN, so that m > n. Thus char Z, > n, so that we have proved the following 
theorem: 


Theorem 7.2.18. Ifn € N, thencharZ, =n. 


If adding the unity element to itself 1 times produces a sum of zero, then the same is 
true for all elements of the ring (Exercise 11). 


Theorem 7.2.19. Let R bea ring with unity where char R =n 4 0. Thennx = 0 for all 
xER. 


7.2.2 Fields defined 

—— 

It might seem strange to introduce our next term at this point, but it turns out to be 
more convenient as we progress through the theory of rings. A field is a special kind of 
ring, and its defining characteristics make it the most specialized kind of ring we'll study 
in this text. Although we won't delve deeply into a general theory of fields, we will notice 
a few of their characteristics that are easy to pick up along our way. 


Definition 7.2.20. Suppose K isa commutative ring with unity, with the property that ev- 
ery nonzero element has a multiplicative inverse. Then K is called a field. 


In addition to properties R1-R10, a field K must have the following features: 


(K11) There exists e € K\{0} such thate-k =k forallk € K (G4). 
(K12) For all k € K\{0}, there exists k~! € K\{O} such that k - k~! = e (G5). 


(K13) Multiplication is commutative (abelian). 


Notice that properties K11—K13 complete the requirements for K \ {0} to be an abelian 
group under multiplication. Thus a shorthand way of defining a field K is to say that 
K is an abelian group under addition, and K \{0} is an abelian group under multiplica- 
tion. 


7.2 Ring Properties and Fields 


Example 7.2.21. Q and R are fields. Also, C is a field. In Section 6.1, we showed that C 
with addition is an abelian group, and in Exercise 1 from Section 6.1, you showed that 
C\{0 + 07} with multiplication is an abelian group. 


EXERCISES 
1. Prove Theorem 7.2.1: If R isa ring, thena-0=0-a=Oforallae R. 
2. Prove Theorem 7.2.2: If R isa ring anda, b € R, then: 
(a) (—a)b = —(ab), 
(b) a(—b) = —(ab), 
(c) (—a)(—b) = ab. 
3. Find, with verification, all units in the following rings: 
(a) Z 
(b) Ziz 
(c) Zy 
(d) Dox2 from Example 7.1.8 
4. Find nonzero elements a, b, k in each of the following rings where ak = b but k is 
not unique. 
(a) Zio 
(b) Zox2 
5. Find a zero divisor in each of the following rings. 
(a) Z, for some strategically chosen n 
(b) Zox2 
6. Let Rand S be rings. Find all units and zero divisors in R x S, in terms of the units 
and zero divisors of R and S individually. 
7. Prove Theorem 7.2.10: If a is a divisor of zero in a ring with unity, then a is not a 
unit. 
8. Prove Theorem 7.2.11: If R is a ring with unity and x € R isa unit, then the y € R 
satisfying xy = yx = e is unique. 
9. Suppose R isa ring with unity element e. Show (me)(ne) = (mn)e forallm,n € N. 
10. What is char (Z4 x Zg)? Explain. 
11. Prove Theorem 7.2.19: Let R bea ring with unity where char R=n #0. Thennx = 0 


for all x € R.! 


'Use Theorem 7.2.5. 
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—— 7.3 Ring Extensions 


There is a very important type of algebraic structure that we create from a given algebraic 
structure by tossing in a new element, stirring well, and letting the mixture expand into 
another algebraic structure of the same type. It’s called the process of adjoining an element, 
to create what is called an extension of the original structure. In this section we want to 
get acquainted with the creation of extensions by adjoining elements to commutative 
rings. In principle, there are really only two types of ring extensions that can result from 
adjoining an element. We'll begin with a very specific example of the first type, but instead 
of building it up as an extension ofa certain ring in the most rigorous way, we'll just lay the 
whole structure out there, define equality and the operations, and show that what we’ve 
presented is a ring. But don’t worry. We'll make up for our lax introduction of this ring 
in Section 7.8, where we'll see a more rigorous way to construct it. After we’ve presented 
our example of the first type of ring extension, we'll point out how other extensions of 
the same type can be created by precisely the same reasoning. Finally, we'll construct 
the canonical example of the second type. We'll use these constructions over and over 
throughout the rest of Chapter 7. 


7.3.1 Adjoining roots of ring elements 

Er 

Example 7.3.1. Let S = {a + b/24+ cW4: a,b,c € Z}, the set of all integer linear 
combinations of {1, </2, </4}. First, we define x = a, + b)/2 4+ ciV4 to be equal to 
yHaqr boJ/2 + crw/4 if ay = a, b| = by, and cy = co. Notice that this definition is 
an equivalence relation because it’s just an application of integer equality in triplicate.” 
Define addition © and multiplication © in a natural way, based on the extended dis- 
tributive property and the behavior of ./2 in the real numbers: 


(a + bV2 + cV4) ® (d + eV2 + fV4) = (a+d) + (b+ e)V24 (c+ fyv4 
(a+ bV2+ cV4) © (d+ eV/2+ fV4) = (ad + 2bf + 2ce) + (ae + bd + 2cf)/2 
+ (af + be + cdyw/4. (7.34) 


Because © and © on S have the familiar behavior we expect when viewed within the 
context of the real numbers, we could simply use + and -. Just remember that a single 


?This definition of equality will raise all kinds of concerns in the mind of your professor because of 
something you'll probably just assume without any basis. Youre probably thinking that our definition 
of equality here coincides exactly with equality in R, so that two expressions in S are equal if and only if 
they are equal in R. Clearly, if x = y as we've defined equality for S, then x = y in R. But just because 
a tbhJ/2+c74 = a + boV/2 4+ V4 in R, we cannot conclude immediately that a) = ap, 

b, = by, and c, = cy. If you crank out 5,096,516,652 — 5184/2 + 91,047,715,794,/4 and 
2,669,624,714 + 130,936, 500,093x/2 — 11,347,811, 196./4 ona TI-85 calculator, it appears they 
might be equal in R. They’re not, and it is indeed true that equality in R implies equality in S. That is, 
if there’s a way to write a real number in the form a + bs/2 + cv/4, then there’s only one way to do it. 
To prove this, you would need to know more about «/2 and ¥/4 as real numbers and their relationship 
to each other in the context of Z. The term is linear independence, and youll see it in linear algebra. 


7.3 Ring Extensions 


entity in S is of the form a + bx/2 + c¥/4, and includes partial forms like 1, 62, or 
4 — ./4 by letting certain coefficients be zero. 

From Eqs. (7.34) and closure of integer addition and multiplication, the closure of 
® and © are immediately obvious. Furthermore, S contains the additive identity 0 + 
0/2 + 0/4, additive inverses (—a) + (—b)/2 + (—c)V/4, and unity element | + OV2+ 
0./4. Furthermore, all remaining ring properties are assumed for all of R, and are there- 
fore inherited by S. Since © is commutative, S is a commutative ring with unity element. 


The notation we use for the ring in Example 7.3.1 is Z[/2]. This notation is meant to 
denote that the ring of integers has had an additional element 2 thrown in, or adjoined, 
as we say. Adjoining an element to an algebraic structure is obviously different from 
unioning it onto the set. Instead of merely tossing it in as one more additional element, 
we toss it in, then combine it by + and - with itself and all other elements to expand into 
a ring. Thus you can see that the presence of /4 in Z[V/2] is necessary so that we have 
closure of multiplication: (/2)* = \/4. However, the fact that (\/2)? = 2 € Z means 
that expressions of the form a + bx/2 + cs/4 are all that are necessary. By reasoning 
similar to that in Example 7.3.1, we could begin with Z (or Q), choose n € N and 
x € Z, define equality, addition, and multiplication, and show all ring properties (plus 
commutativity) for 


Z[x/x ] = {ag + ayx/x + ay x2 + os ba, yVx"-1 2a; € Z}. (7.35) 


The form of elements of Z[./x] in Eq. (7.35) assumes that n is the smallest natural 
number such that x” € Z, so that none of the terms x/x,..., /x"—! is an integer. 

For example, Q[/5] = {a+ b/5: a,b € Q} fits the form of Eq. (7.35). There’s 
no reason x in Eq. 7.35 cannot be negative, and Z[/—1] = Z[i] = {a+ bi: a,b € Z} 
is called the Gaussian integers. It turns out that Z[,/—5] is an important ring, and we'll 
take a look at it in Section 7.5. Finally, R[i] = C. Notice how Z is the subring of ZLV/2] 
consisting of all elements a + by/2 + cv/4 where b = c = 0. 

For the class of rings in the next theorem, we'll need to have a handle on the units 
when we get to Section 7.5. The proof of the next theorem is important to work through 
because you'll need to mimic the algebraic manipulation in some of our later work. 


Theorem 7.3.2. Suppose p € N is prime. Then the only units in Z[,/— p] are £1. 


Proof: Suppose a+ b,/—p isa unit. We show that b = 0 anda = +1 by supposing 


(a+bV—p)(c+dV—p) =1 (7.36) 


and drawing conclusions about a, b, c, d. Multiplying out the terms in Eq. (7.36) 
and using the definition of equality in Z[,/— p] yields 


ac — pbd =1 and ad+bc=0. (7.37) 
Squaring each side of the equations in (7.37) yields 


a’? —2pabcd + p*b’d” =1 and a’d’ + 2abcd +b’c? =0. (7.38) 
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Multiplying the second equation in (7.38) by p and adding the two equations yields 
a’c* + pa*d? + p*b*d? + pb’c* = 1 
a*(c* + pd”) + pb?(pd* +c?) = 1 

(a? + pb?)(c? + pd’) = 1. (7.39) 


Each factor in Eq. (7.39) is a positive integer because the components are squared 
and p > 0. Thus by Exercise 10f from Section 2.3, 


a’ + pb’? =1 and ce? + pd? =1. (7.40) 


T 
— 


Furthermore, since p > 2, it must be that b = d = 0, so thata = + | 


7.3.2 Polynomial rings 
—Es 
The other type of extension we want to create might seem fundamentally different from 
the preceding ones, but the principle is really the same. It just has one notable difference 
in that the new element we adjoin is, in a sense, more foreign to the original ring than 
numbers like ./—5 are to Z. The relationship of »/—5 to Z is characterized by the fact 
that (/—5)? = —5 € Z, or if you prefer, (\/—5)* + 5 = 0. Similarly, ifx = /1+ V2, 
then (x? — 1)? — 2 = 0. Thus, as with /—5, there is some way to manipulate x using 
only the ring elements and operations to produce zero. The term that describes this 
relationship of /—5 and V1 + V2 to the integers is algebraic, and numbers that are not 
algebraic are called transcendental. For example, z is transcendental over Z because there 
is no way to combine z and any finite set of integers using the ring operations a finite 
number of times to produce zero. Strict definitions of these terms will come in your later 
work in algebra. For now, we simply construct an example where the symbol we adjoin 
is transcendental over Z because we define the ring and the behavior of the symbol to 
make it so. 

Let R be a commutative ring, and write R[t] to mean the set of all polynomials in 
the variable ¢, where the coefficients are elements of R. That is, 


Rit] = {ant” + ay_it" | +--+ ait +ajp:n € W, a € R for allk, 


(7.41) 
anda, # Oifn  O}. 


The reason we insist on a, 4 0 for n ¥ 0 is that it would be kind of silly to suggest 
that the highest power term of an element of R[t] is a,t” when its coefficient a, wipes it 
out. However, if m = 0, we do want to allow for ap = 0, the zero polynomial. We address 
an arbitrary element of R[t] as f, as if it were a function. However, we won't write it 
as f(t), because we're not interested at this point in an element of R[t] primarily as an 
expression into which we substitute numeric values for ¢, but more as a string of symbols 
whose behavior in the ring we’re constructing involves the symbol ¢ merely as a way to 
describe how elements of R[t] add and multiply. First, we define f = a,,t” + +--+ do 
and g=b,t" +--+ + bo to be equal if m=n and a, =b, for all 1 < k < n, which 
is clearly an equivalence relation. Concerning the binary operations on R[t], define 
addition and multiplication of two elements in the familiar way of adding and multiplying 
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two polynomials. Notationally it’s very ugly to state the definitions of addition and 
multiplication formally, but you're certainly familiar with the way they’re done. Assuming 
this, let’s check that all ring properties RI-R10 are satisfied. 

First of all, the fact that R has properties RI-R3 and R6 means that R[t] does, too. 
Letting n = 0 and ap = 0 reveals that f = 0 (viewed as a polynomial in R[t] and not as 
a mere element of R) is the additive identity (R4). The existence of additive inverses in R 
makes property R5 clear. Considering the way polynomials multiply, properties R7—R10 
call on all the similar properties of both addition and commutative multiplication in R. 
Showing these is a messy task, but not difficult. Furthermore, multiplication in R[t] is 
commutative because R is commutative, and if R has a unity element e, the polynomial e 
is the unity element in R[t]. Thus R[t] is a commutative ring and has a unity element 
if R does. The new ring R[f] is called the polynomial ring over R. 

If we were to write an element of Z[\/2] as c(/2)* + bv/2 + a, we could say that 
elements of Z[W2] are three-term polynomials in the symbol ./2. The fundamental 
difference between Z[\/2] and R[t] is that the three-term polynomials c(/2)* + bV2+a 
are all that is necessary to have closure of the ring operations when V2 is adjoined to Z. 
The behavior of \/2, that is, the fact that (\/2)? = 2 € Z, means that it’s not necessary 
to have terms of the form (./2)" for n > 3. However, in R[t], polynomials can be of any 
degree, whatever that means. 


7.3.3 Degree of a polynomial 

EE 

If f € Rit] is written as f=a,t” + --- + do, where a, 40, we define the degree of 
f to be n, and we write deg f =n. This definition does not assign a degree to the 
zero polynomial, so we won't assign a degree to it. Some authors define deg0 = —oo. 
Even though —oo is not a real number, this degree assignment can serve as a way to 
make theorems involving deg f hold for the zero polynomial. Instead of assigning a 
degree to the zero polynomial and making it a special case in theorems, we will agree 
that polynomials whose degree we’re working with are always nonzero polynomials. In 
R{[t], the degree of a polynomial is a measure of its size, like |x| in R, even though its 
properties do not really jibe with the norm properties NI-N3 (beginning on page 57) 
and Theorem 2.3.22. But as we'll see later, it gives us at least some way to apply the 
WOP to R[t]\{0}. Here are some properties of polynomial degree that you'll prove in 
Exercise 5. 


Theorem 7.3.3. Suppose R is a commutative ring and f, g € R[t] are nonzero polynomi- 
als. Then the following hold: 


1. If deg f = deg g =n, thendeg(f + g) <n. 
2. If deg f > deg g, then deg(f + g) = deg f. 
3. deg( fg) < deg f + deg g. 


It’s easy to see why part 1 in Theorem 7.3.3 is an inequality, for if f = 2r° + 1 and 
g = —21? + 4¢ are elements of Z[t], then f + g = 4t + 1. However, the fact that part 3 
is an inequality instead of an equation might seem strange. The existence of zero divisors 
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in R allows for this odd behavior in R[t]. For example, in Z¢[t], 
(217 + 3)(3¢ + 3) = 6° + 6f° + 94 +9 = 34 +3. (7.42) 


Thus, the degree of a product can be strictly less than the sum of the degrees of the 
factors. Equation (7.42) also illustrates that f | g is possible for some f and g where 
deg f > deg g. These unfamiliar idiosyncracies can happen because of the presence of 
zero divisors in R. In Section 7.5, these behaviors will go away when we look at the 
polynomial ring over an integral domain. 


EXERCISES 


1. Usingi, j, k asin the quaternion group (Example 6.1.12), construct the ring extension 
Zi, j,k], defining equality, addition, and multiplication, then showing that all ring 
properties R1-R10 are satisfied. Is it commutative? 


2. Finding the units in a ring amounts to solving the equation xy = 1. In Ze, the only 
units are | and 5, so the equation xy =6 | implies x, y € {1, 5}. Use this fact and the 
technique in the proof of theorem 7.3.2 find all 16 units in Z6[V2]. 


3. Find, with verification, all units in Z[i]. 


4. Prove that the following commutative rings with unity are fields by showing that every 
nonzero element is a unit. 


(a) Qlv2] 
(b) QU] 


5. Prove Theorem 7.3.3: Suppose R is a commutative ring and f, g € R[t] are nonzero 
polynomials. Then the following hold: 


(a) Ifdeg f = deg g =n, then deg(f + g) <n. 
(b) Ifdeg f > deg g, then deg(f + g) = deg f. 
(c) deg(fg) < deg f + deg g. 
6. Calculate (247 + 4t + 1)(31? + 3t + 4) in Z,[], then in Z,[r]. 


7. Give an example of a ring R and f € R[t] such that f is a divisor of zero. 


— 7.4 Ideals 


7.4.1 Definition and examples 

=) 

In the same way a normal subgroup isa special kind of subgroup that exhibits a character- 
istic stronger than closure, we define a special class of subring where we have something 
stronger than closure of multiplication: 


7.4 Ideals 


Definition 7.4.1. Suppose R is aring and J C R is closed under addition, contains 0, and 
is closed under additive inverses. Suppose J also has the property that rx € J for all x eJ 
andr € R. Then J is called a left ideal of R. Similarly, if xr € J for allx € J andr € R, then 
T is called a right ideal. If I is both a left and right ideal, it is called a two-sided ideal. If R is 
commutative, there is no distinction between left and right ideals, and we will simply use the 
term ideal. Also, for simplicity, if R is not commutative, then we will refer simply to an ideal to 
denote an arbitrary left or right ideal. 


Since R is an abelian group with respect to its addition operation, J is an additive 
subgroup, even a normal one because R is abelian as an additive group. What makes an 
ideal, say a left ideal, more than a subring is that it is more than simply closed with regard 
to multiplication. A left ideal has properties $1-S4, but property S4 is replaced with the 
stronger property that rx € J forall x € J andr € R. Here are the defining properties of 
a left (or right) ideal: 


(Y1) J is closed under addition (S1). 
(Y2) J contains the additive identity (S2). 
(Y3) J is closed under additive inverses (S3). 


(Y4) Forallx € Jandré R,rx € J (orxre/). 


If I is a left ideal of R, we might say that J is impenetrable from the left against 
multiplication, even if the thing multiplied by is outside J. An ideal absorbs, if you will, 
multiplication from the left. Similarly for a right ideal. 

Ina ring R, the ideal {0} is called the trivial ideal. Also, R is an ideal of itself. All other 
ideals besides {0} and R are called proper ideals. 


Example 7.4.2. Let S be the set of all polynomials in Z[t] whose constant term is even. 
Then S is an ideal in Z[t] (Exercise 1). 


Example 7.4.3. In the ring Z, use the notation (n)={kn : k €Z} in precisely the 
same way we did in Section 6.3. We’ve already shown that (7) is a subgroup of the 
additive group, so to show that (7) is an ideal in Z, the only thing that must be shown 
is that it absorbs multiplication. But this is clearly true, for if kn € (n) and m € Z, then 
m(kn) = (mk)n € (n). 


Example 7.4.3 is a special case of a general type of ideal in a ring R with unity. Starting 
with any element a € R, we simply create a left (or right) ideal by multiplying a on the 
left (or right) by everyr € R. That leads you right to the following theorem, which you'll 
prove in Exercise 2. 


Theorem 7.4.4. Suppose R is a ring anda € R. Then the set 
Ra = {ra:r€ R} (7.43) 


is a left ideal in R. 
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By similar reasoning, aR = {ar : r € R} isa right ideal in R. If R does not have a unity 
element, then the ideals Ra and aR might not contain a. 


Example 7.4.5. Let E be the ring of even integers. Then 
6E = {...,—24, —12, 0, 12, 24,...}. (7.44) 


Example 7.4.6. For the polynomial ring Z[t], (3t + 2)Z[r] is the ideal of all multiples 
of f = 3t + 2 in Z[t]. Since Z[t] has unity, 3¢ + 2 € (3r + 2)Z[r]. 


A result analogous to Theorem 7.1.10 holds for ideals, but we must distinguish 
between left and right (Exercise 5). 


Theorem 7.4.7. Suppose F is a family of left (or right) ideals of a ring R. ThenQyegl is a 
left (or right) ideal of R. 


In Exercise 13, you'll show that the intersection of a left ideal and a right ideal need 
not be either a left or right ideal. The example you'll use to illustrate this requires Theo- 
rem 7.4.11 and some familiarity with Example 7.4.12. In a commutative ring, where there 
is no difference between left and right ideals, Theorem 7.4.7 states that the intersection 
of a family of ideals is an ideal. 

The union of two ideals is not necessarily an ideal (Exercise 7). However, there is 
a theorem that will come in handy in Section 7.6 that says something about the union 
across a special family of ideals (Exercise 8). 


Theorem 7.4.8. Suppose (In}nen is a family of left (or right) ideals of a ring with the 
property that I, © In41 for alln € N. Then USS I, is a left (or right) ideal. 


7.4.2 Generated ideals 


Analogous to the subgroup generated by a subset of a group, we can define a similar term 
for ideals. 


Definition 7.4.9. Suppose R is a ring and AC R is nonempty. Suppose J C R has the 
following properties: 


(Ul) ACT. 

(U2) J isaleft ideal of R. 

(U3) If J isaleftidealof Rand A C J,then/ C J. 

Then J is called a left ideal generated by A, and is denoted (A);. 


We can similarly define a right ideal generated by A and denote it (A),. If R is 
commutative, then there is no distinction between (A), and (A),, so we denote such an 
ideal (A) and call it the ideal generated by A. Does (A); exist? If so, is it unique? And, if 
so, what does it look like? Here’s your answer (Exercise 9). 


7.4 Ideals 


Theorem 7.4.10. Suppose R is aring and A C R is nonempty. Let F be the family of all left 
ideals of R that contain all elements of A. Then (A); exists uniquely and can be written as 


(Ay =()L (7.45) 


leF 


If the path we traveled when we discussed subgroups generated by A C G suggests 
a direction for us to go from here, we would consider the left ideal generated by a single 
element a € R and show that the top-down form of (a); in Theorem 7.4.10 is equivalent 
to a form that can be built from the bottom up, by starting with a and building up to a 
subset of R that has properties U1—U3. Let’s do this now. In order for this program to 
work, R must have a unity element. 


Theorem 7.4.11. Suppose R is a ring with unity, and a € R. Then (a); = Ra (the con- 
struction in Eq. (7.43)) and (a), = aR. 


Proof: We prove for (a); only by showing Ra satisfies properties U1-U3. The proof 
for (a), is similar. First, since R has a unity e, a = ea € Ra, so that U1 is satisfied. 
Second, from Theorem 7.4.4, Ra is an ideal of R, so that U2 is satisfied. Finally, 
suppose J is any left ideal of R that contains a, and pick any x € Ra. Thenx = ra 
for some r € R. But since J is an ideal of R anda € J, it must be that ra € J, so that 
Ra C J. Thus U3 is satisfied and (a); = Ra. a 


Example 7.4.12. Describe the left and right ideals of Zp. generated by E HR 


Solution: We'll describe the left ideal here and then you'll describe the right ideal in 


Exercise 11. For [: i] € Zox2, 


a b\|2 0 2a 3b 
k i E | = B | ee) 
Thus the left ideal is the set of all 2 x 2 matrices whose first column entries are even, 
and whose second column entries are multiples of three. H 


Regardless of whether the presence of a unity allows (a); to be written in the form 
Ra, the left ideal generated by a single element a € R is called the principal left ideal 
generated bya. If R does not have a unity element, then Ra won't contain a. Thus the 
construction of (a) would be a bit more complicated than that in Theorem 7.4.11. When 
we work with (a), we'll always be in the context of a ring with unity. In Exercise 14, 
youll construct a form analogous to Eq. (7.43) for (A);, where A = {a), d2,..., dn}. To 
simplify the notation, we'll write (A) = (a1, ..., Gn). 

If R isa commutative ring with unity, there is an important link between the existence 
of proper ideals and what kind of ring R is. We'll prove one direction of the next theorem, 
and you'll prove the other in Exercise 16. 


Theorem 7.4.13. Suppose R is a commutative ring with unity. Then R isa field if and only 
if it has no proper ideals. 
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Proof: Suppose R is a commutative ring with unity e. 


(<=) Suppose R has no proper ideals. Choose any nonzero x € R. We show that x has 
a multiplicative inverse in R. Since x #4 Oand R has no proper ideals, (x) = R, 
so that e € (x). With Theorem 7.4.11, (x) = Rx, so that there exists r € R such 
that e = rx. Thus x has a multiplicative inverse in R and R is therefore a field. 


(=) (Exercise 16.) | 


Suppose / is a proper ideal of a ring R, anda € R\J. Similar to the way we adjoin 
an element to a ring to create an extension, we can construct a left or right ideal of R 
that contains a and all elements of 7. Our next theorem addresses the construction for 
the left-sided case, and you'll prove it in Exercise 19. We'll need this construction in 
Section 7.9 in the context of a commutative ring with unity. 


Theorem 7.4.14. Let R bearing, I an ideal of R, and fixa € R\I. Let 
J=f{ra+i:reR,ieT}. (7.47) 
Then J is a left ideal of R. 


7.4.3 Prime ideals 

= 

If p €N is prime and p|ab, then either p|a or p|b (Theorem 2.7.15). Another way 
to say this is: If pk; = ab, then either pky = a or pk3 = b. Using the language of the 
ideal generated by p and Theorem 7.4.11, yet another way to say this is: If ab € (p), then 
either a € (p) or b € (p). Ina general ring, we assign a term to any proper ideal with this 
special property, whether or not the ideal is principal. 


Definition 7.4.15. Suppose R is aring, and P is a proper ideal of R with the property that 
ab € P implies eithera € P orbe€ P. Then P is called a prime ideal of R. 


In Exercise 20 you'll determine whether certain ideals are prime. Although we haven't 
looked into properties of prime elements in a ring yet, it deserves to be said right away 
that we must be careful about jumping to conclusions about prime ideals and principal 
ideals generated by prime elements. In Section 7.5, we'll see that ifa principal ideal (a) isa 
prime ideal, then a must be a prime element. However, just because a is a prime element, 
it does not mean that (a) is a prime ideal. Example 7.5.17 will illustrate an example of how 
this can happen. In Section 7.6, where we look at principal ideal domains, we'll see that a 
prime ideal will always be generated by a prime element. (See Exercise 20 for examples.) 


7.4.4 Maximal ideals 


Any proper ideal of a ring R is contained in a larger ideal, namely R itself. But if an ideal 
is as large as it can be without actually being all of R, then we call it maximal. Here’s 


7.4 Ideals 


the definition. Since we’re primarily interested in maximal ideals in a commutative ring, 
we'll set the definition in that context. 


Definition 7.4.16. Suppose R is a commutative ring, and J is a proper ideal with the prop- 
erty that if J is any ideal such that J C J CR, then either J = J or J=R. Then J is called 
a maximal ideal. 


We can visualize a maximal ideal M in the following way: If M is a maximal ideal, 
then for any r € R\M, the only ideal of R that contains r and all elements of M is R 
itself. 


Example 7.4.17. If p € N is prime, then the ideal it generates in Z is maximal (Exer- 
cise 21). 


Example 7.4.18. In Z, (4) is not maximal, for it is a proper subset of the ideal of even 
integers. 


Example 7.4.19. In Z,,.2, the rightideal J = iE as :a,b,cdé z} is not maximal, 
for J = 1e 7 :a,b,c,dé€ z| is also a right ideal (Exercise 22). 


Cc 
We see right away that there is a relationship between prime and maximal ideals in 
commutative rings with unity (Exercise 23). 


Theorem 7.4.20. If M is a maximal ideal in a commutative ring with unity, then M is 
prime. 


The reason that the ring in Theorem 7.4.20 has to have a unity element can be seen 
by letting R be the even integers and M = (4) = {..., —8, —4, 0,4, 8, ...}. Since 2 x 2 
€ (4), then (4) is not prime. However, (4) is maximal (Exercise 24). 

About now you should be asking for either a theorem claiming that prime ideals in 
commutative rings with unity are also maximal (so that the terms are logically equivalent) 
or an example of a prime ideal that is not maximal. Well, they’re not logically equivalent, 
and ifyou’ve done the exercises up to this point, you have got verification right before your 
eyes (Exercise 25). When we restrict ourselves to principal ideal domains in Section 7.6, 
we'll see that prime ideals are also maximal. 


EXERCISES 


1. Show that S defined in Example 7.4.2 is an ideal in Z[t]. 


2. Prove Theorem 7.4.4: Suppose R is a ring anda € R. Then the set 
Ra= {ra:reéR} (7.48) 


is a left ideal in R. 
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3. 


4. 


10. 
11. 


12. 


Let M= {[f }] : 4,0 € Z]. Show that M isa right ideal in Z2,.2 but not alleft ideal. 


Let R be a commutative ring and Z the set of all zero divisors in R. What is wrong 
with the following proof that Z U {0} is an ideal in R? 


Proof: Suppose R is a commutative ring and Z the zero divisors in R. 


(Y1) Let z1, z2 € ZU {0}. If z1 = z2 = O, then clearly z; + z2 = 0 € ZU {0}. If 
precisely one of z;, Z2 is zero, then without loss of generality, z} = 0 and 
Z2 # 0. Then zp is a zero divisor, so there exists nonzero a € R such that 
zja = 0. Thus (z; + z2)a = za = O, so that z, + Z> is a zero divisor. 
If neither z, nor z2 is zero, then there exist nonzero aj, a2 € R such that 
Za, = 22a) = 0. Thus 


(Z1 + 22)a1az = 214142 + 2242a, = Oar + Oa, = 0, 


so that z; + Zz is a zero divisor. In any case z} + Z2 € ZU {0}, so that ZU {0} 
is closed under addition. 

(Y2) By definition, 0 € Z U {0}. 

(Y3) Let ze ZU {0}. Ifz = 0, then —z = 0 € ZU {0}. Ifz $ O, then there exists 
nonzero a € R such that za = 0. Thus (—z)(a) = —za = —0 = 0, so that 
—z is a zero divisor. In either case —z € Z U {0}. 

(Y4) Letze ZU {0}, andre R. If z = 0, thenrz = 0€ Z U {0}. If z # 0, then 
there exists nonzero a € R such that za = 0. Since a(rz) = r(az) = 0, it 
follows that rz is a zero divisor. 


Since Z U {0} satisfies properties Y1-Y4, Z U {0} is an ideal in R. | 


. Prove the left-sided case of Theorem 7.4.7: Suppose F is a family of left ideals of a 


ring R. Then Nyes/ isa left ideal of R. 


. In Z, what is (6) N (15)? 
. Demonstrate a ring R and two ideals J; and Jy such that J; U Jy is not an ideal in R. 


. Prove the left-sided case of Theorem 7.4.8: Suppose {J,},en is a family of left ideals 


of a ring R with the property that J, C /,4, foralln € N. Then UC’ | J, is a left ideal 
in R. 


. Prove Theorem 7.4.10: Suppose R is a ring and A C R is nonempty. Let F be the 


family of all left ideals of R that contain all elements of A. Then (A), exists uniquely 
and can be written as 


Ores (7.49) 


Tet 
In Z[t], describe (ft). 
Describe the right ideal in Example 7.4.12. 


Find the principal left and right ideals generated by A = [! °| in Zpx2. 
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13. LettWM= {[: | :a,be Z\.Letr = E 0 fpand consider the following subsets of M: 
rM = {rx:x € M} (7.50) 
Mr = {xr:x eM}. (7.51) 
(a) Show that M is a ring by showing it is a subring of Zp,>. 


(b) By Theorem 7.4.4, the sets in Eqs. (7.50) and (7.51) are ideals in M. Show that 
neither is a subset of the other. 


(c) Show that rM /M Mr is neither a left nor right ideal in M. 


14. Suppose R is a ring with unity, and A = {a1, a2, ..., dy} isa subset of R. Construct 
a form of (A), that is analogous to (a); = Ra and show that it satisfies U1-U3. 


15. Show that the ideal defined in Example 7.4.2 is actually (2, t). 
16. Prove the > direction of Theorem 7.4.13: If R is a field, then it has no proper ideals. 


17. Suppose R is a ring with unity element, and suppose a € R is a unit. Show that 
(a) = (a), = R. 


18. In the ring of integers, let a, b € Z\{0} and g = gcd(a, b). Show that (a, b) = (g). 


19. Prove Theorem 7.4.14: Let R be a ring, J an ideal of R, and fixae R\J. Let J = 
fra +i:reéR,i € I}. Then J isa left ideal of R. 


20. Determine with proof whether each of the following ideals is prime: 
(a) (6) and (7) in Z. 
(b) (2, t) in Z[r]. 
(c) (t) in Z[t]. 
(d) The left ideal from Exercise 12. 


21. Show that if p € N is prime, then the ideal it generates in Z is maximal. 
22. Verify that J and J in Example 7.4.19 are right ideals of Zo. 


23. Prove Theorem 7.4.20: If M is a maximal ideal in a commutative ring with unity, 
then M is prime. 


24. Show that (4) is maximal in the ring of even integers. 
25. Demonstrate an example of a prime ideal that is not maximal. 


26. In Z x Z, is (5) x (3) prime and/or maximal? Verify your answer. 


7.5 Integral Domains 


If R is a commutative ring with unity that has no zero divisors, R is called an integral 
domain, or domain for short. The absence of zero divisors means that the principle of 


3You shouldn’t have to look far. 
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zero products applies in a domain by definition. Thus not only is Z a ring, it’s also a 
domain. Since all nonzero elements of a field are units, a field contains no zero divisors 
and is therefore a domain. Thus Q, R, and C are domains. In Exercise 5 from Section 7.1, 
you found zero divisors in Z, for at least some n. Which values of n cause Z, to be a 
domain will follow from Theorems 7.5.1 and 7.5.7. You'll show the following (Exercise 1) 
with the help of a result from Section 2.7: 


Theorem 7.5.1. If p € N is prime, then Z, is a domain. 


Example 7.5.2. Z{/2] and Qgp are domains, for as subrings of R, they are commuta- 
tive, contain 1, and have no zero divisors. 


In a domain, the fact that there are no zero divisors makes the following true (Exer- 
cise 2): 
Theorem 7.5.3. If D is a domain, and ifac = bc andc 4 0, thena = b. 

With multiplicative cancellation in hand, you can prove the following (Exercise 3). 
Theorem 7.5.4. Suppose D is a domain anda |b in D. Then thek € D such thatak = b 
is unique. 


Then with Theorem 7.5.4, the following term becomes meaningful. 


Definition 7.5.5. Suppose D is a domain, a,b € D, anda is not a unit. If a| b, where 
ak = band k is not a unit, then a is called a proper divisor of b. 


If R is a ring in which Theorem 7.5.4 doesn’t apply, such as those in Exercise 4 from 
Section 7.1, then Definition 7.5.5 cannot be unambiguously applied to all pairsa, b € D. 
For it’s possible to have ak, = b and ak, = b inaring R where k, is a unit in R but ky 
is not. Here’s an illustration. In Zoy2, 


i i}fo iJ=[r Jef aL i]: 759) 


11]. ee 
Now E is a unit in Z,.9, for 


MTR -BC FE 


while E i is not a unit in Zo, (Exercise 4). Finding the multiplicative inverse of a 
matrix is something you'll see in linear algebra. 

Multiplicative cancellation in a domain is a logical consequence of the principle of 
zero products. We could have defined a domain as a commutative ring with unity where 
multiplicative cancellation holds for nonzero elements, then shown that the principle of 
zero products follows from that. In a commutative ring, the principle of zero products 
and multiplicative cancellation are logically equivalent, as the following theorem says 
(Exercise 5). 


7.5 Integral Domains 


Theorem 7.5.6. Suppose R is a commutative ring with unity such that ac = bc implies 
a =b foralla,b,c €R withc £0. Then R is a domain. 


If a commutative ring with unity has nonzero characteristic, then the only way it can 
be a domain is if the characteristic is prime (Exercise 6). 


Theorem 7.5.7. If D is a domain and char D = n 0, thenn is prime. 


With Theorems 7.2.18, 7.5.1, and 7.5.7, we see that Z,, is a domain if and only ifn is 
prime. With the following theorem, which you'll prove in Exercise 7, it follows that Z, is 
a field if and only if p is prime. 


Theorem 7.5.8. A finite integral domain is a field. 


Proving Theorem 7.5.8 requires very little that hasn’t already been done. Since a 
domain is a commutative ring with unity, the only thing left in showing it’s a field is the 
existence of multiplicative inverses. With Theorem 7.5.8, we see that Z, is a field if p is 
prime. 

In the domain Z, we showed in Theorem 2.7.11 that ifa|b and b|a, thena = +b. 
And, coincidentally, +1 are the units of Z. In a general domain, we say that a and b are 
associates if a| b and b| a. Since this definition does not apply to zero, we declare zero 
to be an associate of itself. Notice that this declaration and the definition of associate 
prevent zero from having any other associates. 


Example 7.5.9. In Z[i], 1 +i and 1 — i are associates (Exercise 9). 


You'll show the following in Exercise 10: 


Theorem 7.5.10. Suppose D is a domain, anda, b € D. Thena and b are associates if and 
only if there exist units u, v € D such thata = ub andb = va. 


You should feel an equivalence relation coming on (Exercise 11). 


Theorem 7.5.11. Let D be a domain, and definea ~ b ifa is an associate of b. Then ~ is 
an equivalence relation on D. 


And what do you think the equivalence class of the unity element consists of ? You'll 
prove the following in Exercise 12: 


Theorem 7.5.12. If D is a domain and ~ is the equivalence relation of association, then 
[e] is the set of units of D. 


If a and Bb are associates, then there ought to be some senses in which they are 
interchangeable. One example of how this is true is in Z, where (6) = (—6). Associates 
generate the same principal ideal. The best way to show this is first to prove the following 
(Exercise 13). Its corollary is immediate. 


Theorem 7.5.13. Suppose D isadomain anda, b € R. Thena| b if and only if(b) C (a). 
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Corollary 7.5.14. Suppose D is a domain anda, b € D. Then (a) = (b) if and only ifa 
and b are associates. 


If a is a proper divisor of b, then by Theorems 7.5.4 and 7.5.10, a and b are not 
associates. So (b) C (a) by Theorem 7.5.13, but (a) 4 (b) by Corollary 7.5.14. Thus we 
have the following: 


Corollary 7.5.15. If D is a domain and a is a proper divisor of b in D, then (b) C (a). 


Now let’s look at the relationship between prime elements and prime ideals. First, 
the following holds in a domain (Exercise 14). 


Theorem 7.5.16. If D is a domain and (p) is a prime ideal in D, then p is prime in D. 


But what about the converse? If p € D is a prime element, must it generate a prime 
ideal? In Z, the answer is yes. Thanks to Theorem 2.7.15, if p € N is prime, then (p) is a 
prime ideal in Z. In Z, however, Theorem 2.7.15 depends on the existence of gcds, which 
itself depends on the WOP. There are some domains where strangely enough a prime 
element generates an ideal that isn’t prime, and where a prime p can satisfy p | ab, while 
ptaand pb. The next example presents some of the building blocks of an integral 
domain to illustrate these possibilities. You'll provide most of the details in Exercise 16. 
We'll return to this example in Section 7.6. 


Example 7.5.17. The ring Z[./—5] is a domain because it’s a subring of C and contains 
1. However, we can show the following: 

1. 3and 2+ /—S are prime in Z[./—5]. 

2. 2+/—5 ¢€ (3). 


3. There exists a prime element p € Z[./—5] such that (p) is not a prime ideal. 


4. There exist a, b, p € Z[./—5] where p is prime in D, p | ab, but p{a and p{b. 
First note 
9=3-3= (2+ V—5)(2— V-5S). (7.54) 


We show here that 3 is prime in Z[,./—S] and leave all the other claims for you to verify 
in Exercise 16. Suppose 


3 = (a+ bV—5)(c + dV—5), (7.55) 


where a, b,c,d € Z. We'll show that one of these two factors must be a unit, which 
by Theorem 7.3.2 must be +1. Multiplying out the factors and using the definition of 
equality in Z[./—5] yields 


ac — 5bd =3 and ad+bc=0. (7.56) 


Proceeding as in the Proof of Theorem 7.3.2, we square each equation, multiply the latter 
by 5, add and factor to have 


(a? + 5b*)(c? + 5d*) = 9. (7.57) 


7.5 Integral Domains 


Since both factors in Eq. (7.57) are positive integers, both must be in the set {1, 3, 9}. 
Considering each possibility reveals either a contradiction or the fact that one of the 
factors is a unit. Thus 3 is prime in Z[./—5]. 


In D[t], the polynomial ring over a domain, the degree of a product of polynomials 
behaves in a more predictable way than in R[f] for a ring R. You'll prove the following 
theorems in Exercises 17 and 18. 


Theorem 7.5.18. Suppose D isa domain and f, g € D[t] are nonzero polynomials. Then 


1. deg( fg) = deg f + deg g. 
2. If f | g, thendeg f < deg g. 


Theorem 7.5.19. If D is a domain, then so is D[t]. 


In Exercise 19 you'll determine precisely which polynomials in D[t] are units. Then 
youll see more clearly what it means for a polynomial in D[f] to be prime. Ina polynomial 
ring, we generally use the word irreducible instead of prime to refer to such a polynomial. 


Example 7.5.20. In Exercise 20, you'll show that f =1? — 2 is irreducible in Z[t]. 
However, if we write Z[/2, t] to mean the polynomial ring over the domain Z[V2], 
then f is reducible. 


Example 7.5.21. Let f = 21? — 4. In Exercise 19, you'll look into the reducibility of f 
in Z[t] and Q[r]. 


If f =a,t"+---+ao isa polynomial in Z[t] such thatdeg f > 1, wecall gcd(an, an_1, 
..., a9) the content of f. If the content of f is one, we say that f is primitive. Insight 
from Example 7.5.21 should make your proof of the following immediate (Exercise 21). 


Theorem 7.5.22. Suppose f € Z[t] is irreducible and deg f > 1. Then f is primitive. 


Example 7.5.21 and Exercise 19 reveal that Theorem 7.5.22 does not apply in Q[t]. 
That is, if a polynomial with integer coefficients is irreducible when viewed as an ele- 
ment Z[t], then it is primitive. However, 2t + 6 is irreducible in Q[t], but has content 2. 


EXERCISES 


1. Prove Theorem 7.5.1: If p € N is prime, then Z, is a domain.‘ 


2. Prove Theorem 7.5.3: If D is a domain, and if ac = bc andc # 0, thena = b. 


3. Prove Theorem 7.5.4: Suppose D is a domain and a | b in D. Then the k € D such 
that ak = b is unique. 


4. Show that E i is not a unit in Zo,9. 


4See Theorem 2.7.15. 
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12. 


13. 


14. 


15. 
16. 


17. 


18. 
19. 


20. 


. Prove Theorem 7.5.7: If D is a domain and char D = n ¥ 0, then n is prime. 


Prove Theorem 7.5.6: Suppose R is a commutative ring with unity such that ac = be 
implies a = b for alla, b,c € R withc # 0. Then R is a domain. 


5 


. Prove Theorem 7.5.8: A finite integral domain is a field.® 
. Find multiplicative inverses for all nonzero elements of Z7. 
. Show that | +7 and | — 7 are associates in Z[i]. 


. Prove Theorem 7.5.10: Suppose D is a domain, and a,b € D. Then a and b are 


associates if and only if there exist units vu, v € D such that a = ub and b = va.’ 


. Prove Theorem 7.5.11: Let D be a domain, and define a ~ b if a is an associate of b. 


Then ~ defines an equivalence relation on D. 


Prove Theorem 7.5.12: If D is a domain and ~ is the equivalence relation of associ- 
ation, then [e] is the set of units of D. 


Prove Theorem 7.5.13: Suppose D is a domain and a, b € D are nonzero. Then a | b 
if and only if (b) C (a). 


Prove Theorem 7.5.16: If D is a domain and (p) is a prime ideal in D, then p is 
prime in D. 


Is 2 prime in Z[ V2]? Explain. 

Verify the remaining claims from Example 7.5.17: 

(a) 2+ /—5 is prime in Z[./—5]. (The proof for 2 — /—5 would be similar.) 

(b) 2+ /—5S ¢ (3). (The proof for 2 — /—5 would be similar.) 

(c) There exists a prime element p € Z[./—5] such that (p) is not a prime ideal. 
(d) There exist a, b, p € Z[/—5] where p is prime in D, p | ab, but p{a and ptb. 


Prove Theorem 7.5.18: Suppose D is a domain and f, g € D[t] are nonzero polyno- 
mials. Then 


(a) deg( fg) = deg f + deg g. 
(b) If f |g, then deg f < deg g. 


Prove Theorem 7.5.19: If D is a domain, then so is D[f]. 
Let D be a domain whose set of units is U, and let K be a field. 


(a) What are the units of D[t]? 
(b) What are the units in K [ft]? 
(c) Is f = 2t? — 4 reducible in Z[r]? In Q[t]? 


Explain why f = t? — 2 is irreducible in Z[t] but not in Z[V/2, ¢]. 


>See Exercise 9 from Section 7.2. 
®To find a~', define f(x) = ax and apply Exercise 4 from Section 3.3. 
7Don’t forget zero. 


7.6 UFDs and PIDs 


21. Prove Theorem 7.5.22: Suppose f € Z[t] is irreducible and deg f > 1. Then f is 
primitive. 


22. Suppose R isa ring and let r € R be nonzero. Define f : R— R by f(x) = rx.Show 
that f is not necessarily one-to-one. What additional condition on R guarantees f 
is one-to-one? Prove. 


23. Suppose R and S are both domains. Does it follow that R x S is a domain? 


7.6 UFDs and PIDs 


7.6.1 Unique factorization domains 

EEE 

According to Theorems 2.4.8 and 2.7.18, every natural number n > 2 has a unique 
factorization into natural number primes. When we consider Z, the implication of these 
theorems should be clear. For n € Z, ifn > 2, we lose uniqueness in most cases in that 
we could introduce some negative signs here and there. But this is the only way we lose 
uniqueness. Even then, the prime factors in two different factorizations can be paired up 
as associates of each other (+p), allowing us to say that n > 2 has a prime factorization 
in Z that is unique up to order and association of the factors. For n < —2, applying 
the same principle to —n allows us to say that every nonzero integer that is not a unit 
has a factorization into prime integers, and this factorization is unique up to order and 
association of the factors. Some domains have this same property: that every nonzero 
element that is not a unit can be written as a product of prime elements of the domain, 
uniquely up to order and association of the factors. A domain with this feature is called 
a unique factorization domain, or UFD for short. Right away we see that a field is a UFD, 
for every nonzero element is a unit. 

In our progression from more general to more specialized rings, UFDs are the point 
where we can show that any two nonzero elements have a greatest common divisor, 
unique up to association. We'll adapt the definition of gcd in Definition 2.7.12 to make it 
applicable to a general domain, then take a passing glance at how you show existence and 
some sort of uniqueness of gcd in a UFD. The first place we actually need the existence 
of the gcd is in a PID, where it is a fairly easy thing to show. 

An important example of a UFD that we'll merely make reference to right now is 
Z[t]. To verify that Z[t] is a UFD takes some mathematical machinery that we won’t 
create until Section 7.7, where we study Q[t] in some depth. While Z[t] is a subring of 
Qlt], it is not as specialized a ring because there are limitations on coefficients in Z[t] 
that do not apply in Qt]. However, there are some features Q[t] has that we need to 
apply to elements of Z[t] to show it’s a UFD. 

Since a UFD is by definition a domain, we do not need a theorem claiming that a UFD 
is a domain. However, not all domains are UFDs. For example, Z[./—5] isa domain, and 
Eq. (7.54) shows that 9 has two distinct factorizations into prime elements in Z[/—5]. 
Thus Z[./—5] is not a UED. 

With a very minor adjustment in the wording, we can define gcd in a domain. 
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Definition 7.6.1. Suppose D is a domain, and let a, b € D be nonzero. Suppose g € D 
has the following characteristics: 


(D1) gl|aandg |b. 


(D2) If h is any element of D with the properties that | a and h| b, then it must be that 
h| galso. 


Then g is called a greatest common divisor of a and b, and is denoted gcd(a, b). 


In Section 2.7, we said that a practical way you might find gcd(a, b) in Nis by breaking 
down a and b into their prime factorizations, taking the appropriate number of 2’s, 3’s, 
etc. and building the gcd from that. We didn’t prove that such a trick produces a natural 
number that satisfies DI1—-D2 because the WOP provided an easier and more useful way. 
Alas, in a general UFD, proving the existence of gcd(a, b), unique up to association, must 
be done by exploiting the unique prime factorizations of a and b in that somewhat sloppy 
way. We’ll state the theorem here, followed by some details of the proof that might make 
the notation minimally sloppy. You'll finish the proof in Exercise 1. 


Theorem 7.6.2. Suppose D is a UFD, anda, be D are nonzero. Then there exists g € D 
that satisfies DI1-D2, and if g, and go both satisfy DI-D2, then g, and go are associates. 


Here are some suggestions on how to prove Theorem 7.6.2 and keep the notation from 
getting outrageously complicated. If we break a and b down into prime factorizations, 
then let {p1, p2,.--, Pn} be all the primes that appear in either a or b, we can write 


On 


a= pM p?..<p™ and b= pi'pi?--- ph, (7.58) 


where some of the a, and 6; might be zero. If we let y, = minfox, By} for alll <k <n, 
we claim that g = pj' p}” --- pz” satisfies D1-D2. Since y, = min{ax, B;}, we know that 
Ve < ag and yy, < Bx for all k. This should make it easy to show that g has property D1. To 
show that g has property D2, suppose h | a and h | b. Then there exist kj, kz € D such that 
hk, = a and hky = b. If the unique factorization of h is written as h = Gras gen, 
then the prime factorizations of hk, and hky must agree with Eqs. (7.58), so that some 
possible reordering of the p; allows us to say that gq, = p, foralll <k<m<n.Ifm <n, 
then letting 6, = 0 form+1<k <n givesush = pi oat p&. Now, the fact that h | a 
implies that 5, < a, for alll < k < m. Similarly, h | b implies that 5, < 6, for all 
1<k <n. Thus 6, < yy for all 1 <k <n, and by constructing the appropriate / € D, we 
can show hl = g. Thus g has property D2. Showing that g is unique up to association 
is surprisingly easy. In fact, the way you proved uniqueness of gcd(a, b) in N should 
translate directly over to D to imply that any g; and g> that satisfy DI-D2 must be 
associates. 


7.6.2 Principal ideal domains 


Since a domain is commutative and has a unity element, left and right ideals are identical, 
and the principal ideal generated by a € D can always be written as Da (or aD). 


7.6 UFDs and PIDs 


Furthermore, from Exercise 14 in Section 7.4, if A = {a,,..., a,}, then 
(A) = {dja, + doay +--+ + dnd, : dy € D for alll < k <n}. (7.59) 


Let’s write this form of (A) in Eq. (7.59) as Da; + Daz + +--+ Dan. 

In some domains, principal ideals are the only ones there are. Every ideal will have a 
single generator. If D is a domain such that every ideal in D is principal, then D is called 
a principal ideal domain, or a PID for short. Right away we see that a field K is a PID, 
for Exercise 16 from Section 7.4 says that the only ideals of K are {0} and K itself, the 
former generated by zero, the latter by e. You've already seen an example of a domain that 
is not a PID, but before we point it out, we'll concentrate on understanding why some of 
the domains we’re acquainted with are PIDs, and results for PIDs in general. Then the 
example of a domain (actually a UFD) that is not a PID will point itself out. First let’s 
review the integers a bit. 

After we defined divisibility and gcd in Z in Chapter 2, we showed that gcd(a, b) 
exists for all a,b € Z\{0} (Theorem 2.7.13). Furthermore, it is the smallest positive 
element of 


S={ma+nb:m,n eZ}. (7.60) 


Thus our proof that gcd(a, b) exists in Z depended on the WOP of N to give us a number 
that we could show satisfies properties D1—D2. Furthermore, in showing that this smallest 
element of S divides both a and b, you employed the division algorithm. And you proved 
the division algorithm with the help of the WOP. 

Notice that S$ in Eq. (7.60) is precisely Za + Zb, the ideal generated by {a, b}. Thus, 
gcd(a, b) is the smallest element of (a, b). Moreover, in Exercise 18 of Section 7.4, you 
showed that {a, b} and gcd(a, b) generate the same ideal in Z, so that the ideal generated 
by a two-element set in Z is really principal after all. Once again, it’s the WOP at work, 
leading you to the smallest positive element of an ideal, and allowing you to show that it 
generates the whole ideal. By looking in this same direction, you can show the following 
in Exercise 2: 


Theorem 7.6.3. The integers are a PID. 


The way you'll attack this deserves a comment. To show any particular domain D is a 
PID, you must show that every ideal J can be written as Da, where a ¢€ / isa generator of 
I that you must find. If J = {0}, that’s a piece of cake. However, if J is an arbitrary ideal 
with a nonzero element, then it has a positive element. Then the WOP of N comes in and 
gives you a generator for J on a silver platter (details notwithstanding). In Z, the WOP 
is the basis for its being a PID and for its containing gcds. However, the WOP depends 
on < as a measure of size of elements in N. 

Another example of a PID is Q[t]. Rather than show this directly, we'll show in 
Section 7.7 that Q[t] is a Euclidean domain, and that all Euclidean domains are PIDs. 


Example 7.6.4. The ring Qop from Example 7.1.9 is a PID. In Exercise 3, you'll prove 
this by applying the WOP in a creative way to an arbitrary ideal in Qg p to finda generator. 


Now let’s suppose we’re working in a domain where we don’t necessarily have a 
measure of element size, so that we have no way to apply the WOP. With gcd defined as 
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in Definition 7.6.1, and with the assumption that all ideals are principal, you can prove 
the following (Exercise 4): 


Theorem 7.6.5. Let D bea PID, and a, b € D benonzero. Then there exists gcd(a, b) € D. 


Instead of going to (a, b) and taking its smallest element to be gcd(a, b) as you did 
in Z, you simply use the fact that (a, b) has some generator. Since (a, b) = Da + Db, we 
have the following: 


Corollary 7.6.6. If D isa PID anda, b € D are nonzero, then there existm,n € D such 
that gcd(a, b) = ma +nb. 


Unfortunately, Theorem 7.6.5 says nothing about the uniqueness of gcd(a, b) but 
there is something close (Exercise 5). 


Theorem 7.6.7. [fg and g» are both greatest common divisors ofa and b in a PID, then 
g1 and go are associates. 


Thus even if there are several gcds of a and b, they’re all unit multiples of each other. 
If g is one gcd of a and b, then everything in its equivalence class of associates is, too. 
If [e] is the set of gcds of a and b, then we say a and b are relatively prime, and note 
that if a and b are relatively prime, then there exists a linear combination such that 
ma+nb=e. 

Now let’s find that example of a domain that is not a PID. In any commutative ring, a 
maximal ideal is prime (Theorem 7.4.20). However, there exist prime ideals that are not 
maximal, such as (f) in Z[t] (Exercise 25 from Section 7.4). And (t) is a proper subset 
of (2, t), the ideal from Example 7.4.2 of all polynomials with even constant term. The 
following theorem lets us close in for the kill (Exercise 6). 


Theorem 7.6.8. Suppose D is a PID and I is a prime ideal in D. Then I is maximal. 
Corollary 7.6.9. Z[t] is not a PID. 


Proof: (2, t) is prime but not maximal in Z[r]. | 


By Theorem 7.6.8, if an ideal in a domain is prime but not maximal, the domain 
is not a PID. Since (2, t) is prime but not maximal in Z[t], then Z[t] is not a PID. In 
Exercise 7 you'll show explicitly that (2, f) is not principal by demonstrating that any 
supposed generator fails. In Section 7.7, we'll show that Z[t] is a UFD. Now we want to 
show that a PID is a UFD. We have to take several steps to get there, but fortunately a lot 
of the uniqueness work has already been done for N in Section 2.7 and translates over to 
a PID almost word for word. The sticky part in showing existence of prime factorizations 
in a PID stems from the fact that the defining characteristic of a PID concerns its ideals, 
while the defining characteristic of a UFD concerns its elements. The link between them 
is the following: All ideals in a PID have a generator, and the way these ideals contain 
each other as subsets is tied by Theorem 7.5.13 to divisibility of their generators. Since 
in a PID a|b if and only if (a) D (b), the way an element breaks down into factors is 
directly linked to the way principal ideals stack. So let’s make an important observation 
about the way ideals in a PID cannot stack. 


7.6 UFDs and PIDs 


Let D be a PID and {J,,},en a family of ideals such that J, C [,4; for alln € N. 
By Theorem 7.4.8, / = UP? ,J, is an ideal. Since D is a PID, / is principal, and has a 
generator a € I. Now since a € UJ, there exists n € N such that a € J,. We claim 
that for allk >n, I, = I,. For if this is not true, then there exists some k > n such that 
I,\J, is nonempty. Let x € I\I,. Since x € I, then x € J, so that x = ay for some 
y € D. Also, since J, is an ideal anda € J,, thenay € I,,. Butay = x,so x € I, which 
is a contradiction. Thus it is impossible that I, D In. 

The upshot of all this is that if you’re in a PID and have a set of hypothesis conditions 
that allows you to create a family of ideals {J,}°° , where J, C J,41 for all n €N, then 
you ve produced a contradiction, and at least some of the hypothesis conditions are false in 
a PID. 

Now let’s show that every element of a PID can be written as a product of prime 
elements, and that this factorization is unique up to order and association of the factors. 
To prove the former, suppose D is a PID, and suppose there exists a € D that cannot be 
written as a product of primes. Then a is not prime itself. Thus there exist a;, b;} € D such 
that a = a,b, and neither a, nor b, is a unit. Furthermore, since a cannot be written as 
a product of primes, then a, or b; (or both) cannot either. Without loss of generality, we 
may assume it’s aj, which means that a, is not prime. And notice, since b, is not a unit, 
(a) C (a1) by Corollary 7.5.15. 

Now since aj is not prime, there exist a2, by € D such that a; = ayb2 and neither 
ay nor b is a unit. Once again, since a; cannot be written as a product of primes, either 
a> or bz cannot either. Assume it’s a2, and note that (a,) C (a2) because by is not a unit. 
Continuing in the same way, we can generate {dn }nen © D such that (a,) C (an 41), which 
is a contradiction. Thus the assumption that there exists a € D that cannot be written as 
a product of primes is false, and all elements of D have a factorization into primes. With 
all this said, we have proved the following: 


Theorem 7.6.10. IfD isa PID anda € D is nonzero and not a unit, then there exist primes 
Pi; P2; ey Pn € D such thata — Pip2:: *Dn- 


To show uniqueness of the prime factorization, almost all the work translates directly 
over from our work in Section 2.7. The first step is analogous to Theorem 2.7.14, but the 
proof comes out a little differently because primes in a PID are defined in language that 
differs from that in N. 


Theorem 7.6.11. If D is a PID, anda, p € D are such that p is prime anda # 0, then 
either p|a or p anda are relatively prime. 


Proof: Let g = gcd(a, p). Since g | p and p is prime, then either g is a unit, or it’s 
an associate of p. If g is a unit, then writing am + np = g and multiplying both 
sides through by g~! reveals that a and p are relatively prime. If g is an associate 
of p, then gu = p for some unit u. The fact that g|a@ means that pu—!|a, so that 
pia. | 


With Theorem 7.6.11 in hand, the proofs of the next two theorems become identical 
to those for Theorems 2.7.15 and 2.7.17: 
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Theorem 7.6.12. If D is a PID anda, b, p € D are such that p is prime, a and b are 
nonzero and p | ab, then either p|a or p|b. 


Theorem 7.6.13. Suppose D is a PID and p, a,, a2, ..., 4, € D are such that p is prime 
and a, # 0 foralll <k <n. If p | ayaz--++ dy, then there existsk (1 < k <n) such that 


P| a. 


The uniqueness result is analogous to Theorem 2.7.18, but the proof doesn’t come 
out exactly the same because of possible association of the factors. You'll provide the new 
proof in Exercise 10. Exercise 9 will come in handy along the way. 


Theorem 7.6.14. If D is a PID and a€ D is nonzero and not a unit, then the prime 
factorization of a from Theorem 7.6.10 is unique up to order and association of the factors. 


With Theorems 7.6.10 and 7.6.14, we have the following: 


Theorem 7.6.15. A PID is a UFD. 


Finally, as a direct result of Theorem 7.6.12, we have the following, which you'll prove 
in Exercise 11. 


Theorem 7.6.16. If D is PID and p € D is prime, then (p) is a prime ideal. 


EXERCISES 


1. Prove Theorem 7.6.2: Suppose D is a UFD, and a, b € D are nonzero. Then there 
exists g € D that satisfies D1-D2, and if g; and g> both satisfy DI-D2, then g, and 
8 are associates. 


2. Prove Theorem 7.6.3: The integers are a PID. 
3. Show that Qgp is a PID.8 


4. Prove Theorem 7.6.5: Let D be a PID, and a, b € D be nonzero. Then there exists 
gcd(a, b) € D. 


5. Prove Theorem 7.6.7: If gi and gz are both greatest common divisors of a and b in 
a PID, then g; and g» are associates. 


6. Prove Theorem 7.6.8: Suppose D is a PID and / is a prime ideal in D. Then J is 
maximal. 


7. Show that (2, f) is not a principal ideal of Z[r]. 


8. Prove Theorem 7.6.12: If D is a PID and a, b, p € D are such that p is prime and 
p | ab, then either p | a or p | b. 


9. Show that if D is a domain, p € D is prime and wu € D is a unit, then pu is also 
prime.” 


8 Of all nonzero elements of an ideal J, pick one with the smallest power of 2 in the numerator. Use this 
element as a generator. 
°If pu = ab isa factorization of pu, then a(bu7!) is a factorization of p. 
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10. Prove Theorem 7.6.14: If D isa PID anda € D is nonzero and not a unit, then the 
prime factorization of a from Theorem 7.6.10 is unique up to order and association 
of the factors. 


11. Prove Theorem 7.6.16: If D is PID and p ¢€ D is prime, then (p) is a prime ideal. 


7.7 Euclidean Domains 


7.7.1 Definition and properties 

ay 

Let’s return to the division algorithm for Z (Theorem 2.7.6) to expand it and cast it in 
somewhat different language. In writing b = ag +r, we insisted that a be positive, so 
that 0 < r < a is meaningful. We did not have to make this restriction on a. Obviously 
a = 0 won't work, but we could have allowed for a < 0 to have a somewhat broader 
theorem. If we wanted to extend Theorem 2.7.6 to include the case a < 0, it would look 
something like this: 


Theorem 7.7.1 (Extended Division Algorithm). Suppose a, b € Z\{0}. Then there exist 
unique q,r € Z such thatb =aq+rand0 <r <|al. 


Extending the division algorithm as stated in Theorem 2.7.6 to include the casea < 0 
is fairly easy to do. Forifa < 0,then—a > 0,and by Theorem 2.7.6, there exist qi, 7 € Z 
such that b = (—a)q; +r; andO <r < —a. Letting gg = —q, and rz = r;, we have 
b = aqo. + ro, and 0 < r2 =r < —a = |a|. Your proof of uniqueness of gq andr from 
Theorem 2.7.6 probably did not depend on the sign of a, and would therefore apply for 
this case, too. 

With Theorem 7.7.1, we're using absolute value as a measure of the size of integers, 
and saying that any two nonzero integers can be related by breaking one of them (b) 
down into a certain multiple (q¢) of the other (a), with some possible stuff left over (7), 
but where the stuff left over is smaller in size than a. There are other important domains 
where some notion of size can be imposed on the elements, and something akin to the 
division algorithm using that measure of size works for all nonzero elements. Since the 
division algorithm does not involve the zero of the domain, we don’t insist that zero be 
assigned a measure of size. Here is the definition. 


Definition 7.7.2. Suppose D is an integral domain, and suppose there exists a function 
d: D\{0} — W with the property that d(a) < d(ab) for alla, b € D, whichis called a val- 
uation. Suppose also that, for alla, b € D\{O}, there exist g, r € Dsuch that b = aq +7, 
and eitherr = Oord(r) < d(a). Then Dis called a Euclidean domain (€D). 


Definition 7.7.2 deserves a few comments. First, demonstrating that a domain is an 
ED requires the creation of a valuation d that assigns a nonnegative integer size to all 
nonzero elements of the domain. There might be more than one such valuation, but 
we'll point out some features that d has as a result of the requirement that d(a) < d(ab) 
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for all a, b € D\{0}. Second, given two nonzero elements a and J, either b must be a 
multiple of a, in which case r = 0, or just the right distance from a multiple of a so that 
writing b = ag + r can be done with r of sufficiently small size. 


Example 7.7.3. The integers form an ED. Letting d(a) = |a|, we have that |a| < |ab| 
for alla, b € Z\{0}. With Theorem 7.7.1 we have that Z is an ED. 


Example 7.7.4. A field K isanED. Ifweletd : K\{0} > Wbhedefinedbyd(x) = | forall 
x € K\{0}, we see that d(x) =d(xy) forallx, y € K\{0}. Furthermore, for x, y € K\{0}, 
we may let g = y/x to have x = qy, so that r = 0 is always possible. 


Divisibility in a field is trivial because nonzero elements are all multiples of each 
other. Theorem 7.7.6 will reveal that any valuation on K \{0} would have to be constant. 

There is one more classic example of an ED that we want to mention without 
providing any of the proof. The Gaussian integers Z[i] are an ED, and the function 
d: Z\{0 + 0i} > W defined by d(a + bi) = a? + b* can be shown to be a valuation 
that works. 

Before we spend some quality time with one more very important example of an 
ED, let’s derive some results about EDs in general. First, let’s see how to show that an ED 
is a PID. If D is an ED and we pick any ideal J, we must find some a € I where every 
element of J is a multiple of a. But that shouldn't be too hard, for an element a € J such 
that d(a) is minimal would probably serve nicely as a generator for J, and the division 
algorithm on D ought to be just the right tool to enable us to show it. All that said, you'll 
prove the following in Exercise 1. 


Theorem 7.7.5. An ED isa PID. 


If D is an ED, it is by definition an integral domain. Thus, it has a unity element 
e. If we pick any a € D, it follows that d(e) < d(ea) = d(a), so that d(e) is minimal 
among all values of d. This fact and the division algorithm should come in handy when 
you prove the > direction of the following in Exercise 2. 


Theorem 7.7.6. If D is an ED, then d(u) = d(e) if and only ifu is a unit. 


Theorem 7.7.6 sheds a little more light on some things we already know. First, since 
absolute value can serve as a valuation on Z, we see that the units in Z are precisely the 
values of x for which |x| = |1|, namely +1. Also, since every nonzero element ofa field K 
is a unit, any valuation d must satisfy d(x) = d(e) for all x € K\{0}. Thus, d must be 
constant. Conversely, if a valuation on an ED is constant, then every nonzero element is 
a unit, so that the ED is a field. 

As we’ve progressed from rings to domains to UFDs to PIDs to EDs, we claimed 
that there are examples of one structure that do not qualify as an example of the next 
most restrictive structure. In every case up to now, we’ve provided an example with 
complete proof, except for Z[t], which we'll address later in this section. So about now 
you should be asking, “Where’s my example of a PID that’s not an ED?” Well, this is the 
only place in our progression from more general to more specialized rings where we’re 
going to present an example of a PID that’s not an ED, and give only a loose explanation 
of how this would be shown. The classic example of a PID that is not an ED was first 
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constructed by T. Motzkin in 1949, very recently indeed by mathematical standards. It is 
ZL + /—19)/2], the set of all expressions of the form m + n(1 + /—19)/2), where 
m,n € Z., For convenience, let’s write a = (1 + /—19)/2, and discuss how one goes 
about showing Z[a] is a PID but not an ED. 

First let’s address the fact that Z[a] is not an ED. We do this by showing that every 
ED has a certain feature, then showing that Z[a] does not have this feature. If D is any 
ED that is not a field, then there will exist nonzero, nonunit elements. If d is a valuation 
on D, then a nonzero, nonunit element x will satisfy d(x) > d(e). Among all elements 
of D, let a be a nonzero, nonunit element for which d(a) is minimal. Such an element is 
called a universal side divisor. By the division algorithm on D, any x € D can be written 
as x = aq +r, where either r = 0 or d(r) < d(a). Since d(a) is minimal among all 
nonzero, nonunit elements of D, we may say that either r = 0 or d(r) = d(e). That is, 
if a is a universal side divisor, then every x € D may be written as x = aq +r, where 
either r = 0 or r is a unit. Every ED that is not a field will have universal side divisors 
because the valuation won't be constant. 

The next thing we would need to know is what the units are in Z[q]. It turns out that 
the only units in Z[q@] are £1. So let’s suppose that Z[a] were an ED. Since the only units 
in Z[a] are +1, Z[a] is not a field. Thus there exists a universal side divisor a € Z[a], 
and every x € Z[a] can be written as x = aq +r, wherer € {0, 1}. In particular, 
x = 2 must be writable in this way. So aq = 2 — r, and the only possible values of 2 — r 
are {1, 2, 3}. Thus, a divides at least one of {1, 2, 3} but, since a is not a unit, a{ 1. With 
some work, we could show that the only divisors of 2 are {+1, +2} and the only divisors 
of 3 are {41, +3}. Thus, a € {42, +3}. If we let x = a, however, it turns out that there 
isnogq € Z[a] for which a = aq +r, given that a € {42, +3} andr € {0, +1}. This is 
a contradiction, so Z[a] is not an ED. 

Now let’s address the fact that Z[q] is a PID. First, let’s take the defining characteristic 
of an ED and state it first in the original way, then in an altered form. A domain D is an 
ED if there exists a valuation d on D\{0} such that for all nonzero a, b € D: 


1. There exist g,r € D such that b = aq +r and either r = 0 ord(r) < d(a). 


2. Either b is in the ideal generated by a, or it is possible to subtract from b some q 
multiple of a to produce an element r = b — aq whose valuation is smaller than that 
of a, that is, d(r) < d(a). 


Let’s weaken this second form a bit. Suppose D is a domain with a valuation such that for 
all nonzero a, b € D, either b is in the ideal generated by a, or there is some linear combi- 
nation of a and b whose valuation is smaller than that of a. That is, either b € (a) or there 
exist nonzerom,n € Dsuchthatd(ma+nb) < d(a). This property is sort of like the di- 
vision algorithm, but not quite as strong. In the event that b is not a multiple ofa, we don’t 
insist that some multiple of a can be subtracted from b to produce an element of small 
valuation, but only that some linear combination of a and b has sufficiently small valua- 
tion. A valuation with this feature is called a Dedekind-Hasse norm, and if a domain is such 
that there exists a Dedekind-Hasse norm, then we can show the following (Exercise 3): 


Theorem 7.7.7. Suppose D is a domain with a valuation d, and with the property that, for 
allnonzeroa, b € D, eitherb € (a), or there existm,n € D suchthatd(ma+nb) < d(a). 
Then D is a PID. 
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The trick then is to show that Z[a] entertains a Dedekind-Hasse norm, so that Z[a] 
isa PID. A function d : D\{0} — R that works is d(a + ba) = a* + ab+ 5b’, for which 
we won't provide any details. 


7.7.2 Polynomials over a field 

a 

An important example of an ED is Q[t], and we want to address that now. Every nonzero 
polynomial f € Q[t] has a nonnegative degree, and deg f is precisely the valuation we 
want to use. From Theorem 7.5.18, since deg( fg) = deg f + deg g, and since deg g > 0, 
we have that deg f < deg(fg) for all f, g € Q[t]. Showing Q[t] is an ED then boils 
down to proving a sort of division algorithm on Q[t]. Here is precisely the theorem we 
need, with uniqueness of g and r to boot. The technique we'll use to get the proof off the 
ground should look surprisingly familiar. You'll provide a few of the details in Exercise 4: 


Theorem 7.7.8. Suppose f, g € Qt] are nonzero polynomials. Then there exist unique 
q,r € Q[t] such that g = fq +r and eitherr = 0 ordegr < deg f. 


Proof: Choose f, g € Q[t], both nonzero polynomials, and define 
S={g—fq:qe€ Qt]}. (7.61) 


If the zero polynomial is in S, then we have g = fq for some g € Q[t]. Otherwise 
we may let r be any polynomial in S whose degree is minimal. Thenr = g — fq 
for some q € S, and since Q[t] is closed under addition and multiplication, we 
have that r € Q[t]. Thus, we have g = fq +r. To show degr < deg f, suppose 
degr > deg f. Then we may write 


f =4nt"™ +--+ +ag9 and r=b,t" +---+bo, (7.62) 


where m <n, and neither a,, nor b, is zero. By Exercise 4, we may create a nonzero 
element of S whose degree is strictly less than degr, which is a contradiction. Also 
by Exercise 4, g and r are unique. a 


Having shown that Q[t] is an ED opens a floodgate of interesting facts about this 
very important ring. First, unlike Z[t], Q[t] is a PID, so every nontrivial ideal has a 
nonzero generator f, and deg f is of minimum degree among all elements of (f). 
Furthermore, from our comments after the statement of Theorem 7.7.5, any polynomial 
in (f) whose degree is the same as that of f can serve as a generator of (f). If we write 
f = at” + anit"! +--+ +a, then a, 4 0 and we can create a new polynomial 
m=az,'f =t" + (dy_1/a,)t"| +--+ + o/d, whose leading coefficient is one. Since 
the polynomial a;! is in Q[t], it follows thatm € (f). Alsodegm = deg f,so(m) = (f). 
A polynomial whose leading coefficient is the unity element of the ring of coefficients is 
called monic, and we see that every ideal in Q[t] has a monic generator. 

Since Q[t] is an ED, it is also a UFD, so that every nonzero polynomial in Q[f] factors 
down into irreducible polynomials. What is more, this factorization is unique, at least 
up to a point. Any two factorizations of f that appear to be different can be seen upon 
inspection to have components that can be paired up as associates. Now associates are 
unit multiples of each other, and from Exercise 19 from Section 7.5, the units in Q[f] are 
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the polynomials of degree zero. Thus, two polynomials in Q[t] are associates if and only 
if one is a nonzero constant multiple of the other. 


Example 7.7.9. In Q[r], f = #° + 317 + 2 can be factored as t(t + 2)(t + 1) or 
(4t)(4t+ 2)(3t + 3). Associate pairs are ¢ and 4t,¢+2 and it + i, andt+ 1 and it + i. 


We’ve said a lot about irreducible polynomials in Q[t], but we have not developed 
any criteria by which we can determine whether a polynomial is reducible or irreducible. 
By Theorem 7.5.18, ifdeg f = 1 and f = gh, then one of g or h has degree zero, and is 
therefore a unit. Thus polynomials of degree one are irreducible in Q[t]. 

Another simple criterion for irreducibility involves evaluating f € Q[t] at somea € 
Q. In Section 7.3, we said we weren't really interested in elements of R[t] as functions 
where we would plug in values for t, but more as a string of symbols. Well, that was sort of 
a lie. As a ring in and of itself, R[t] really is exactly as we described it in Section 7.3, and t 
really is just a formal symbol whose presence is used to define addition and multiplication 
of two elements in R[t]. However, for a given f = a,t” + Gn_\t" | +--+» +a € Rit], 
there are some good reasons we might want to choose a specific a € R and calculate the 
value in R of the expression a,@” + a,— pa"! +4...+4a9. The point is that sometimes the 
element of R that is produced by calculating f(a) makes an important statement about 
the polynomial f and its role in R[t]. Perhaps you spent time in high school algebra 
trying to factor polynomials into irreducibles, and one way you might have stumbled 
onto a linear factor of f was by discovering some number a such that f(a) = 0. For 
example, if f = 4° — rt + 2r — 2, you might have noticed that f(1) = 0, so you wrote 
f = (t — 1)g, which by division became f = (t — 1)(t? + 2). This worked because of 
the following theorem (Exercise 5), where the division algorithm on Q[t] makes itself 
very useful: 


Theorem 7.7.10. Suppose f € Q[t] anda € Q. Then (t —a) | f ifand only if f (a) = 0. 


Theorem 7.7.10 makes the following almost immediate (Exercise 6): 


Theorem 7.7.11. Suppose f € Q[t] and deg f € {2, 3}. Then f is reducible if and only 
if there existsa € Q such that f(a) = 0. 


Whether a polynomial in Q[t] is reducible is not an easy question to answer in 
general. There are several criteria that can help answer the question for certain spe- 
cial polynomials, and you'll probably see them in your upper level algebra class. When 
we look at Z[t] we'll see an important relationship between reducibility in Q[t] and 
in Z[t]. 

If there’s one important idea in mathematics that you should find yourself cluing into 
at this point in the game, it is the fact that certain properties of mathematical structures 
are the logical basis for other properties, and that these properties can sometimes be 
isolated and translated over to other structures that might be different in some ways. 
For example, we proved in Section 2.3 that a - 0 = 0 for alla € Z. Then we noted in 
Section 7.1 that a-0 = 0-a = 0 in any ring by an argument identical to that for Z. The 
point is that a few basic ring properties came together to make a-0 = 0-a = Otrue (basic 
properties of addition, including additive cancellation, and the distributive property), 
even without commutativity of multiplication. Thus, any mathematical structure in 
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which we have additive cancellation and the distributive property will be a structure 
wherea-0=0-a=0. 

The fact that Q[t] is an ED calls on the fact that Q is a field, but it does not require 
that it be a particularly special kind of field. If we go back and replace Q with an arbitrary 
field K in Theorems 7.7.8—7.7.11, exactly the same proofs work. This can be particularly 
interesting if we use the field Z, for p prime. Here are restatements of Theorems 7.7.8— 
7.7.11 for an arbitrary field, and some examples to illustrate its application beyond Q[t]: 


Theorem 7.7.12. Let K bea field. Then K with valuation deg f is a Euclidean domain. 


To illustrate Theorem 7.7.12, you'll verify the following in Exercise 7: 


Example 7.7.13. Let f) = 307 +4+2and g,; = 3443427 +214 4 in Zs[¢]. 
Then by polynomial division (like in high school algebra), there exist g, r € Zs[t] such 
that g} = fig +randdegr < deg f;. However, for fy = 2f + 2 and go = f* in Z¢[t], 
it is impossible to write gy = fog +r for any qg,r € Ze[t] where either r = 0 or 
degr < deg fa. 


Theorem 7.7.14. Let K bea field, f € K[t], anda € K. Then (t —a)| f if and only if 
f(a) = 0. 


Theorem 7.7.15. Let K bea field, f € K[t], and suppose deg f € {2,3}. Then f is 
reducible if and only if there existsa € K such that f (a) = 0. 


In Exercise 8 you will apply Theorem 7.7.15 to several polynomials in Z3[t]. Since 
Zz is such a small field, determining whether there exists a € Z3 such that f(a) = 0 is 
very quick. 


7.7.3 Z[t] is a UFD 


We waited to show that Z[t] is a UFD, and now is the time to tackle the question. We 
could have shown the existence ofa factorization of a polynomial in Z[t] into irreducibles 
earlier, but uniqueness of this factorization up to order and association of the factors 
requires us to view elements of Z[t] as elements of Q[t], where factorizations are unique 
up to association. The reason we have some work to do to show uniqueness is that 
reducibility and association in Q[t] are different from reducibility and association in Z[¢]. 
Polynomials such as 2¢ + 6 and 10f + 15 are irreducible and associates in Q[t], but not 
in Z[t]. First the easy part, which you'll show in Exercise 9: 


Theorem 7.7.16. Every nonzero, nonunit polynomial in Z[t] has a factorization into irre- 
ducible polynomials in Z[t]. 


To make our way to uniqueness up to order and association, we need the following. 
Youll provide the climactic detail in Exercise 10. Remember that the term primitive 
applies only to polynomials of degree at least one. 


Theorem 7.7.17 (Gauss' Lemma). The product of two primitive polynomials is primitive. 
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Proof: Suppose f, g € Z[t] are both primitive polynomials. We show that fg is 
primitive by supposing p € N is any prime, then showing there is some coefficient 
in fg that is not divisible by p. 

Suppose p € N is prime, and write 


f =a4nt" +--- +49 and gH=bt" +---+)Do. (7.63) 


Since f and g are both primitive, there are coefficients in both f and g that are not di- 
visible by p. Leta, and b; be the coefficients of the lowest powers oft in f and g, respec- 


tively, that are not divisible by p; that is, p divides all of ao, a1, ..., ax—1, bo, bi, ..-, 
bj_1, but not a, or b;. Then by Exercise 10, p does not divide the coefficient of ght! 
in fg. Since this is true for all primes, fg is primitive. 7 


Theorem 7.7.18. Suppose f € Z[t]. If f is reducible in Q{t], then it is reducible in Z[t]. 


Theorem 7.7.18 says simply that if a polynomial has integer coefficients, and it can be 
factored into polynomials of degree at least one by viewing it as an element of Q[f] and 
resorting to rational coefficients in the factors, then you can adjust these factor coefficients 
and make them integers. We'll prove Theorem 7.7.18 here in a somewhat conversational 
way to keep the notation from getting too sloppy. Notice the point at which we apply 
Theorem 7.7.17. 


Proof: Suppose f € Z[t], and suppose f = f| fo for some fi, fo € Ql[t], where 
deg f;, => 1 for 1 <k <2. Let dj, dy € Z be the product of all the denominators of 
all the coefficients of f; and f2, respectively, then let g; = d, fi and gz = d2 fr, so 
that g1, go € Z[t]. Next, let c1, cp € Z be the content of g; and go, respectively, and 
factor these out to write g; = cy, and go = coho, where hy, hz € Z[t] are primitive. 
If we let c be the content of f, we may write f = cg, where g € Z[f] is primitive. 
Thus, we have 


(cd, dz) g = (d\dz) f = dy fide fo = 9182 = (C1C2) hy ho, (7.64) 


where g, 1, h2 € Zare all primitive. By Theorem 7.7.17, hih2 is also primitive so 
the content of the left-hand side of Eq. (7.64) is cd,d2 and the content of the right- 
hand side is cco. Since cd; dy and cc are both positive integers, cdjdz = cic2, and 
g = hyh2. Thus, f = cg = chyhz and we have a proper factorization of f in Z[t]. 

| 


Now the result we’ve been waiting for: 
Theorem 7.7.19. The factorization of f € Z[t] into irreducible polynomials in Z[t] is 
unique up to order and association of the factors. 

Proof: Suppose f € Z[t] can be written as 


f = PiP2°+* Pm = %192°°* In (7.65) 


where all p;, and q; are irreducible polynomials in Z[t]. Note that all p;, and q, are 
either prime constant polynomials, or primitive. Since p», | G1 - +: Gn» then viewing py 
and all g; as elements of Q[f], irreducible in Q[t] by Theorem 7.7.18, we have that 
Pm | qx (in Q[t]) for some k by Theorem 7.6.13. Reordering the g,, we may assume 
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Pm | Gn» and since qy, is irreducible in Q[t], we may write (a/b) Pm = qn for some 
a/b € Q. Thus, apm = bqn, and since p,, and qd, are primitive, a = +b. Therefore, 
Gn = £Pm, so that p» and q, are associates in Z[t]. Substituting +p,, for gy in 
Eq. (7.65) and canceling p,,, we have 


Pivt+ Pm-1 = £41 ++ * Qn-1- (7.66) 


By the inductive assumption, the remaining p; and g, may be reordered and paired 
as associates, so that the factorization of f into irreducibles in Z[t] is unique up to 
order and association of the factors. | 


EXERCISES 


. Prove Theorem 7.7.5: An ED is a PID. 
. Prove Theorem 7.7.6: If D is an ED, then d(u) = d(e) if and only if u is a unit. 


. Prove Theorem 7.7.7: Suppose D is a domain with a valuation d, and with the 


property that, for all nonzero a, b € D, either b € (a), or there exist m,n € D such 
that d(ma + nb) < d(a). Then D isa PID,!° 


. Finish the proof of Theorem 7.7.8 by showing the following: 


(a) The polynomial r; = g — fq — (bn/am)t"-” f is an element of S such that 
degr; < degr. 

(b) Ifg = fai +r and g = fq2 +12 where rr; = O ordegr; < deg f and where 
ro = Oordegrz < deg f, then g; = q2 andr, = 1.!! 


. Prove Theorem 7.7.10: Suppose f € Q[t] anda € Q. Then (¢ — a) | f ifand only 


if f(a) = 0. 


. Prove Theorem 7.7.11: Suppose f € Q[t] and deg f € {2,3}. Then f is reducible if 


and only if there exists a € Q such that f(a) = 0.'* 


. Verify the claims in Example 7.7.13: First write gi = fig +r for q,r € Zs[t]. 


Then show that if g andr are any polynomials in Ze[t] such that either r = 0 or 
degr < deg fo, then gy) = fog +r is impossible.!? 


. Apply Theorem 7.7.15 to the following polynomials in Z3[t] by either finding a 


proper factorization or explaining why they are irreducible: 
(a) fi=t?+t+1 
(b) fP=e?t+t+2 
(c) fP=Pt+r?+2 
(d) fa=Ott+2 


'0Show that an arbitrary ideal J is principal by letting a € I be such that d(a) is minimum. You can 


then show that any b € J must be a multiple of a. 


'lWhat can you say about deg(r2 — r,)? What does Theorem 7.5.18 allow you to conclude? 
!21F f is reducible, what can you say about the degree of one of its factors? 
For the Ze[t] claim, the coefficient of t? in fog must be one. Show that this is impossible. 


7.8 Ring Morphisms 


9. Prove Theorem 7.7.16: Every nonzero, nonunit polynomial in Z[t] has a factorization 
into irreducible polynomials in Z[r].'* 


10. Finish the proof of Theorem 7.7.17 by showing p does not divide the coefficient 
of t+! in fg. 


7.8 Ring Morphisms 


The theory of ring morphisms should seem like a pretty breezy topic after having studied 
group morphisms in Section 6.5. The only difference in the definition in the context 
of rings is that there are two binary operations to preserve. As we’re doing increasingly 
often, we'll be fairly relaxed about notation for elements and operations in the two rings, 
unless we need it to avoid confusion. 


Definition 7.8.1. Suppose R and S are rings and @ : R — Sis a function with the 
properties that P(x + y) = d(x) + d(y) and d(xy) = (x)G(y) forall x, y € R. Then 
¢ is called a morphism from R to S. The terms monomorphism, epimorphism, isomorphism, 
and automorphism are defined in a way analogous to that for groups, and if there exists an 
isomorphism from R to S, we write R = S. 


Here are some examples. Let’s start with a trivial one: 
Example 7.8.2. If R and S are any rings, the mapping defined by @(x) = Oforallx € R 
is called the trivial morphism. 
Example 7.8.3. If R is any ring, the identity mapping i(x) = x is an automorphism. 


Example 7.8.4. Define ¢ : Z > Z,, by d(x) = (n) + x, the equivalence class of 
x modn. Thus, ¢ maps x to its remainder upon division by n, and ¢ is an epimorphism: 


gaty=M+[xt+ yl =) +x] +) + yl =¢@)+6(y) (7.67) 
(xy) = (n) +xy =[) + x]1@) + yl = 6) ¢()). (7.68) 
Furthermore, ¢ is onto. For ifk € Z,, then 6(k) = k. 
Example 7.8.5. For Z[./x] in the form of Eq. 7.35, define @ : Z > Z[x/x] by 
(k) = k + O0/% $2 + OVE, (7.69) 


Then ¢ is a monomorphism. This example illustrates a subtle distinction between two 
ideas that deserves a comment. After we constructed Z[/2] in Example 7.3.1 (see 
page 256), we said that Z is a subring of Z[./2]. If we had been a bit more rigorous 
and abstract in our construction of Z[./2], we would have built it up, not as a set of 


4Use strong induction on deg f and mimic the proof of Theorem 2.4.8. Theorem 2.4.8 itself takes care 
of the case deg f = 0. 

'5The coefficient of t*t! is bar a; bx4)-;. Look separately at the terms 0 <i < k —1,i =k, and 
k+1<isk+l. 
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expressions of the form a + bx/2 + cv/4, but as a set of ordered triples 
S = {(a,b,c): a,b,c € Z}, (7.70) 


where addition and multiplication are defined to coincide with the definitions in Exam- 
ple 7.3.1. Specifically, defining addition and multiplication by 


(a, b,c) + @,e, f) = (a+d,b+e,c+ f) 
(a,b, c)- (d,e, f) = (ad + 2bf + 2ce,ae+ bd + 2cf,af +be+cd) 


(7.71) 


incorporates the behavior of \/2 into the operations, even though they are mere manipu- 
lations of ordered triples with no apparent presence of s/2. With these definitions, to say Z 
is a subring of ZLV2] is technically not true, for it is not true that Z C ZLV/2], as subrings 
must be. Whereas Z is a set of numbers, Z[/2] is a set of ordered triples of numbers. 
However, it does not mean that the link between them is somehow illusory. Defining 
w:Z— Sby w() = (n, 0,0), we have an exact parallel to @ in Eq. 7.69. Whether 
you think of Z[W2] as we originally defined it in Example 7.3.1 or as in Eqs. (7.70) 
and (7.71), we say that we embed Z monomorphically in ZLV2]. Imagine Z and ZLV2] 
as separate, where Z is a set of elements of the form n, and ZA/2] consists of elements 
of the form (a, b, c). Then @(Z) is the set of all elements of Z[./2] of the form (a, 0, 0), 
and is isomorphic to Z, hence structurally the same. 


Example 7.8.5 illustrates the slight breach of rigor we committed in defining ring 
extensions in Section 7.3, so let’s clear that up here. It illustrates a technicality we need 
to be aware of when we say that Z[./2] is an extension of Z. If R and S are rings with 
R CS, then saying S is an extension of R has the same meaning as saying R is a subring 
of S. However, if you start with R and want to extend it to some S, the standard, more 
rigorous way is to build S from scratch and then monomorphically embed R in S. Then 
when we say that S is an extension of R, we mean that S is the range of amonomorphism 
whose domain is R. 


Example 7.8.6. Let R be a commutative ring, and fix somea € R. Define dy : R[t] > 
R by $(f) = f(@). Clearly ¢y is a function (quick mental exercise), but also it’s an 
epimorphism (Exercise 1). This particularly important morphism is called the evaluation 
ata morphism. It maps every polynomial in R[f] to its value at a. 


Example 7.8.7. Define ¢ : Z[i] > Z[i] by (a+ bi) = a — bi. Then ¢ is an automor- 
phism (Exercise 3). This automorphism is called the conjugation morphism, for it sends 
a Gaussian integer a + bi to its complex conjugate a — bi. 


Example 7.8.8. Let R be a ring with unity element e, and define ¢ : Z > R by d(n) = 
ne (Definition 7.2.8, page 251). Then @ is a morphism (Exercise 4). 


7.8.1 Properties of ring morphisms 

a 

Since aring morphism @ : R — S preserves addition between R and S as abelian additive 
groups, our results from Section 6.5 apply to addition. Thus we gain the following for 
free from Theorems 6.5.6 and 6.5.7. 


7.8 Ring Morphisms 


Theorem 7.8.9. Supposed : R > S is aring morphism. Then 
1. (0) =0. 
2. d(-r) = —¢(r) forallr € R. 
3. d(ar) =ng(r) forallr € Randne Z. 


To apply something like Theorems 6.5.6 and 6.5.7 to ring multiplication, we have to 
make some modifications. First here is an observation you'll prove in Exercise 6. 


Theorem 7.8.10. If R is a ring with unity er, andifoh : R > S isaring morphism such 
that b (er) = 0, then @ is the trivial morphism. 


The contrapositive of Theorem 7.8.10 reveals that if @: R — S is a nontrivial mor- 
phism, then ¢(er) # 0. If S has unity es, then comparable to Theorem 6.5.6, you 
might be tempted to think that a nontrivial morphism ¢: R > S would have to satisfy 
(er) = es. To prove such a property and the exponent rules related to it, multiplicative 
cancellation is necessary. Thus a proof of the next theorem becomes possible. Since the 
proof of the exponent rules in parts 2-4 would be identical to the proof of Theorem 6.5.7, 
youll prove only parts 1 and 3 in Exercise 7. 


Theorem 7.8.11. Suppose R is a ring with unity er, S is a domain with unity es, and 
go: R — S is anontrivial ring morphism. Then the following hold: 


1. b(er) = es. 

2. Forallr € Randn € W, o(r") = [o(7)]’. 

3. Ifu € R is a unit, then so is @(u), and d(u!) = [@w) JI. 
4. Ifu € R isa unit, thenod(u-") = [d(u) |" foralln EN. 


Theorems 7.8.9 and 7.8.11 reveal that @ : Z — D in Example 7.8.8 is the only non- 
trivial morphism from Z to any other domain. For suppose y : Z — D is a nontrivial 
morphism. Since 1 is the unity in Z, it must be that w(1) = e. Furthermore, for any 
néeZ,w(n) = v(n- 1) =nw(1) = ne. Applying this result to morphisms ¢ : Z —> Z, 
we see that the only nontrivial morphism from Z to Z is the identity. In Exercise 8 you'll 
use other parts of Theorems 7.8.9 and 7.8.11 to show that the only automorphism of Q 
is the identity. 

If Sisnota domain, Theorem 7.8.11 might not apply. For example, if¢ : Z > Zox2 is 
defined by ¢(n) = k oj then @ isa morphism, but $ (1) = E °| # Inyo. Furthermore, 
though | is a unit in Z, #(1) is not a unit in Zpy. 

Much of our work with group morphisms involved their relationship to normal 
subgroups. The interesting relationships between ring morphisms and substructures 
involves ideals. Here’s a parallel to Theorem 6.5.10. When you prove the left-sided case of 
Theorem 7.8.12 in Exercise 9, there will be parts of your proof that will follow immediately 
from your work in group theory and not require new arguments. For example, in your 
proof of part 1, you may simply point out that since R is an additive group with respect 
to addition, @(R) is an additive subgroup of S$ by Theorem 6.5.10. That takes care of 
three of the requirements in showing that @(R) is a subring of S. 
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Theorem 7.8.12. Suppose d : R— S is a ring morphism. Then 
1. @(R) is a subring of S. 
2. Iff isa left (right) ideal of R, then @(1) is a left (right) ideal of @(R). 
3. If 1 isa left (right) ideal of S, then d~' (1) is a left (right) ideal of R. 
Similar to groups, the kernel of a ring morphism is defined as 
Ker(g) = {r € R: d(r) = O}. (7.72) 


Right away, we then have the following. 


Theorem 7.8.13. If¢ : R — S isaring morphism, then Ker(@) is a two-sided ideal of R. 


Youll prove Theorem 7.8.13 in Exercise 10. As with Theorem 7.8.12, you can save 
yourself some work by applying Theorem 6.5.12. The following needs no additional 
proof, for it follows from Theorem 6.5.13 applied to R as an additive group. 


Theorem 7.8.14. Iff: R > S is aring morphism, then Ker(@) = {0} if and only if @ is 
one-to-one. 

If R is any ring with unity element e, the morphism ¢@: Z— R defined in Exam- 
ple 7.8.8 can look either of two ways, depending on char R. If char R = 0, then by 
definition ne is never zero for nonzero n € Z. The following should be immediate 
(Exercise 11). 


Theorem 7.8.15. If R is a ring with unity and char R = 0, then : Z > R defined by 
b(n) = ne is one-to-one. 


Example 7.8.8 and Theorem 7.8.15 say that the integers can be monomorphically 
embedded in a ring R with unity and characteristic zero. We loosely say that R contains 
Z, meaning it contains a subring generated by its unity that is isomorphic to Z. For 
example, in Rox2, the image of Z under ¢ is 


ES IES EB DG EB do 


If char R = n # O, then there is a smallest n € Zt for which ne = 0. It might seem 
clear that R in this case will contain a subring that looks like Z, instead of Z. Before we 
can make an argument for this, however, we need to create the notion of a quotient ring, 
which we will do in Section 7.9. 


EXERCISES 


1. Let R be a commutative ring and fix a € R. Show that the function ¢, : R[t] > R 
defined by ¢.(f) = f(@) is an epimorphism. 


2. Let R be a commutative ring with unity. Describe the evaluation morphisms ¢o 
and ¢e. 
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3. Show that complex conjugation as defined in Example 7.8.7 is an automorphism of 
Z{i]. 

4. Let R bearing with unity element e. Show that @ : Z —> R in Example 7.8.8 defined 
by @(n) = ne is a morphism. 


421 422 


5. Write A € Zox2 as E i]. Define : Zox2 > Zby (A) = ay.1s¢ a morphism? 


6. Prove Theorem 7.8.10: If R is aring with unity er, andifg : R— Sisaring morphism 
such that @(er) = 0, then ¢ is the trivial morphism. 


7. Prove parts 1 and 3 of Theorem 7.8.11: Suppose R is a ring with unity er, S is a 
domain with unity es, and @: R— S isa nontrivial ring morphism. Then ¢(er) = 
és. Furthermore, if u € R is a unit, so is (uw), and @(u—!) = [@(u) JI. 


8. Show that the only automorphism of Q is the identity. 


9. Prove the left-sided case of Theorem 7.8.12: Suppose @ : R — Sisaring morphism. 
Then 


(a) #(R) is a subring of S. 
(b) If J isa left ideal of R, then @(/) is a left ideal of O(R). 
(c) If J isa left ideal of S, then @~!(Z) is a left ideal of R. 


10. Prove Theorem 7.8.13: If @ : R — S isa ring morphism, then Ker(@) is a two-sided 
ideal of R. 


11. Prove Theorem 7.8.15: If R is a ring with unity and char R = 0, theng@: Z— R 
defined by @(n) = ne is one-to-one. 


7.9 Quotient Rings 


Given a group G and H<IG, we built the quotient group G/H. In an analogous way, 
given that J is an ideal of R, we can build the quotient ring R/J. Part of the work in 
building R/T is exactly like that in building G/H, so the fact that R/J has ring properties 
R1-R10 has already been done in part. As with groups, building the quotient structure 
begins by defining a form of equivalence. 


Theorem 7.9.1. Let R bearing and I an ideal of R. Fora, b € R definea =; b ifa —b 
€ 1. Then =, is an equivalence relation on R. 


Since R is an abelian group with respect to its addition operation, J is a normal 
additive subgroup of R. Therefore, Theorem 7.9.1 is merely a restatement of Exercise 2a 
from Section 6.3 in its additive form and needs no additional proof. Thus we’re ready to 
define the set R/J with its addition and multiplication operations, and show that it has 
all properties R1-R10. There are no surprises in the definitions of the binary operations 
on R/I. However, considering the need for H to be a normal subgroup of G in showing 
that the binary operation on G/H is well defined, little bells should be going off in your 
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head as you consider the burden of showing that addition and multiplication on R/J are 
well defined. It should be no surprise that you'll exploit the fact that J is an ideal of R in 
at least part of this demonstration. What you might not see at first is what kind of ideal J 
needs to be and where you'll call on the fact that / is this kind of ideal. Since J is normal 
as an additive subgroup of R, addition is well defined by our work in Chapter 6. It’s a 
different story for multiplication, though. If multiplication in R is not commutative, then 
a left ideal might not be a right ideal, and vice versa. Thus, we might wonder whether J 
needs merely to be either a left ideal or a right ideal, or whether it needs to be both. It 
turns out that J needs to be a two-sided ideal, and you'll see how you need that fact when 
you prove the following in Exercise 1. 


Theorem 7.9.2. Let R be a ring and I a two-sided ideal of R. Fora € R writeI +a = 
[a], where [a] is the equivalence class of a modulo I. Define addition ® and multiplication 
@ on R/I ={I+a:ae R} by 


JTI+a6U+4+b)=I14+(a+b) and (I +a)®(U+b)=I1+(ab). (7.74) 
Then R/I is a ring under the operations ® and ®. 


We can talk our way through almost all properties RI-R10, so that your work in 
proving Theorem 7.9.2 will be minimal. Since R is an abelian additive group and J is a 
normal additive subgroup, R/J has properties R1-R6. Properties R8—R10 are immediate 
from the definitions of © and ®. The only property that takes any real work is R7, showing 
that @ is well defined, and this is what you'll show in Exercise 1. True to form, we suppose 
I+a=1+bandI+c=1-+d and use this to show that J + ac = I + bd. That 
is, supposing a — b,c — d € I should somehow allow you to show that ac — bd € I. 
Multiplication of a — b and c — d by strategically chosen elements, together with the fact 
that J is the certain kind of ideal that you specified, should get you over the hump. 

With R/J defined and shown to be a ring, we should be able to bypass all the 
chatty exposition we provided in Chapter 6 and jump right to theorems analogous to 
Theorems 6.5.15 and 6.5.16. Go back and take a look at these theorems, then try to state 
analogous theorems for rings before you read the theorems below. 


Theorem 7.9.3. Suppose R is a ring and I is a two-sided ideal in R. Then the mapping 
o:R— R/I defined by $(r) = I +r is an epimorphism whose kernel is I. 


Almost everything in Theorem 7.9.3 follows directly from Theorem 6.5.15. The fact 
that R is an abelian additive group means / is a normal subgroup, so that @ behaves 
morphically as far as addition is concerned, is onto, and satisfies Ker(@) = J. The only 
remaining claim is that @ behaves morphically with respect to multiplication. But this 
is a one-liner (Exercise 2). Having thought your way through this, writing a complete 
proof of the following theorem should come naturally (Exercise 3). 


Theorem 7.9.4. Suppose R and S are rings and ¢ : R > S is an epimorphism. Then 
S = R/Ker(@). 


Let’s return to the morphism in Example 7.8.8 defined by ¢(n) = ne. If R has nonzero 
characteristic, then there isasmallestn € N for whichne = 0. By Theorem 7.8.13, Ker(@) 
is an ideal of Z. However, Z is a PID, so Ker(@) = (k) for some k € N. Since ¢(k) = 0, 
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and since n is the smallest positive integer for which ne = 0, it must be that k > n. 
However, because n € Ker(@), it must be that n is a multiple of k, so that k < n. Thus 
k =n. By Theorem 7.9.4, the range of ¢ is isomorphic to Z/(n). That is, R contains a 
subring isomorphic to Z,. 

If we think of a ring with unity element e merely as an abelian additive group, and 
let S be the subgroup generated by e, the identity element of the other operation, then 
the form of S can be determined by looking at the additive form of Eq. (6.25). 


S = {ne:ne Z}, (7.75) 


which is precisely the range of ¢ in Example 7.8.8. Thus, as far as addition is concerned, 
S is the smallest additive subgroup of R that contains e. If we can then show that S is 
closed under multiplication, we'll have that S is the smallest subring of R that contains 
e. From Exercise 9, Section 7.1, closure is immediate. Thus, in bits and pieces, we have 
proved the following: 


Theorem 7.9.5. Suppose R is a ring with unity. If char R = 0, then R contains a subring 
isomorphic to Z. If char R = n ¥ 0, then R contains a subring isomorphic to Z,. In either 
case, such is the smallest subring of R that contains e. 


Because polynomial rings over fields are EDs, they make for some particularly impor- 
tant quotient rings. Let’s spend some time studying the quotient ring created by modding 
out the ideal generated by f = 217 — t+ 5 from Q[r]. The ties back to Z and Z, are 
uncanny, so we'll hold the quotient ring Q[t]/(f) up against Z, as we dissect it. To be 
concrete, we let n = 6 and draw parallels between the relationship of Q[t] to Q[t]/(/) 
and the relationship of Z to Ze. 

Since Z is an ED, every a € Z can be written asa = 6q¢ +r for some q,rEZ 
and 0 < r < 5. Consequently, every a € Z is equivalent mod 6 to some element of 
{0, 1, 2,3, 4,5}. Thus every element of Ze is a coset that can be addressed by a unique 
representative element in {0, 1, 2,3, 4, 5}. To perform addition and multiplication in Ze, 
a purist would write something like 


[(6) + 4] + [(6) + 3] = (66) +7= (6) +1, (7.76) 
or 
[(6) + 5] x [(6) + 4] = (6) + 20 = (6) +2, (7.77) 


where the first step in these two calculations is application of the definitions of addition 
and multiplication in Ze from Theorem 7.9.2, and the second step is an application of 
equivalence mod 6 to simplify the calculation to a standard form with a representative 
element from {0, 1, 2, 3, 4, 5}. As long as we realize that this is what we are doing, we 
can write the calculations in Eqs. (7.76) and (7.77) as 


44+3 =5 7 =6 1 and 5x 4=6 20 =6 2: (7.78) 


This form has the imagery of performing addition and multiplication in Z, with the stipu- 
lation that any sum or product that gets kicked out of bounds (i.e., out of {0, 1, 2, 3, 4, 5}) 
is translated back into {0, 1, 2, 3, 4, 5} by subtracting a multiple of 6. This will help us 
to see how we may view elements of Zo, and how they combine by addition and multi- 
plication to produce other elements of Ze. 
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The way to visualize elements of Q[t]/(/) in terms of polynomials in Q[f] is strikingly 
similar. Since Q[t] is an ED, any g € Q[t] can be written uniquely as g= fq +r for 
some g,r € Q[t] where either r = 0 or degr < deg f. Thus, if we consider any (f) + 
g € Qlt]/(/), there is a unique r € Q[t] such that (f) +r = (f) + g and either 
r = O or degr < deg f. Therefore, elements of Q[t]/(f) may always be addressed by 
a representative element r where either r = O or degr < deg f. Since we’re using 
f= 2r? — t +5, then we may write 


QUtl/(f) = {((f) + at? +bt +c:a,b,c € Q, (7.79) 


and know that every element of Q[t]/(f) can be written as some (f) + aft? + bt +c. 
Furthermore, if (f)+7r1, (f) tro € Q{t]/C(f/) and (f) +r; = (f) +12 are written in the 
form of polynomials in Eq. (7.79), then r2 =) 1, so that r2 — r; isa multiple of f. Now 
nonzero multiples of f cannot have degree less than deg f, but deg(r2 — r1) > deg f is 
impossible. Thus 7) = r2, and we have that different polynomials of the form at? +bt +c 
will always generate different cosets of (f). 

In the same way we view elements of Ze simply as {0, 1, 2,3, 4,5}, we can view 
elements of Q[t]/(f) simply as polynomials of the form at? + bt + c. What we must 
look at is how they add and multiply. Instead of using coset notation and writing [(f) + 
ait? + bit +c1] +[(f) + ant? + bot + co], we can just write 


[ait? + bit + cr] + [aat? + bot + c2] =(p) (ai + a2)t? + (bi + by)t + (C1 + €2). 
(7.80) 


Adding two such polynomials cannot produce a sum of any larger degree, so Eq. (7.80) 
is all that needs to be said about addition in Q[t]/(f). However, for multiplication, let’s 
illustrate with a concrete example. 


(41? + 2) (31? — 2t + 8) =p) 1214 — 813 + 387? — 4t + 16 
=p) (6 — 4) f +4417 — 381 + 36 (7.81) 
=) 441? — 381 + 36. 


If we multiply two elements of Q[t]/(f) as if they were polynomials in Q[t], and we 
produce a product of degree at least three, we can apply the division algorithm to subtract 
off an appropriate multiple of f from the product to produce an equivalent polynomial 
of the form at* + bt + c. You'll work through similar details for another example in 
Exercise 4. 

Now for another very interesting example. Since Z3 is a field, Z3[t] is an ED, and 
we can construct the quotient ring Zs[t]/(f) for f € Z3[¢] in a similar way. Let’s use 
f =t}4+1t+2, construct the quotient ring, and look at addition and multiplication. By 
exactly the same reasoning as in Q[r], Z3[t]/(f) = {at* + bt +c: a,b,c € Z3}. Notice 
that this is a finite set. Each of a, b, and c can take on values from {0, 1, 2}, so Zs[t]/(f) 
has 27 elements. Adding elements of Z3[t]/(/) is easy: 


(20? +t +2) + (0? + 2t +2) =p) 307 + 3t +4 cp) 1. (7.82) 
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Doing multiplication would look like the following if we simplify the product by way of 
the division algorithm: 


(207 + ¢ + 2)(t? + 2t + 2) =(p) 2¢4 + 503 + 82? + Of +4 
=p) 2t4 +2242 +1 
=(f) (2t + 2) f 


=p) 9. 


(7.83) 


However, there is a slick way to simplify multiplication by making substitutions. In Z3[t]/ 
(f)s f =) 0, ort +142 =, 0. This can also be written ast? =p) —t—2 =p) 2¢+1. 
The upshot is that any t? produced in the process of multiplication can be replaced with 
2t + 1, thus bringing the degree of a product back down: 


(207 +t + 2)(t? + 2t +2) =p) 24 + 503 + 817 + Or +4 
=p) 2044+ 2°42 41 
=f) (2) + 207 + 207 +1 
=p) (2t)(2t + 1) + 2Q2t+ 1) +2041 (7.84) 
=p) 407 + 2t+4t+24+2P 41 
=f) 6t? + 6t +3 
=f) 9. 


You'll practice this technique in Exercises 5 and 6. 

The last two theorems we want to present in this section and the examples following 
each illustrate some very interesting implications of the results we have worked so hard to 
develop. The two theorems are not results that would likely jump out at you as obvious, 
but they are elegant and not difficult to prove. You'll prove the first one in Exercise 8. 


Theorem 7.9.6. Suppose R is a commutative ring, and I is an ideal of R. Then R/T is an 
integral domain if and only if I is a prime ideal. 


We state our last theorem as an if-and-only-if theorem, but you will be required to 
prove only one direction. 


Theorem 7.9.7. Suppose R is a commutative ring with unity e, and I is an ideal of R. 
Then R/I is a field if and only if I is a maximal ideal. 


When you prove the < direction of Theorem 7.9.7 in Exercise 9, you’re going to 
suppose J is a maximal ideal of a commutative ring R with unity e, then show that R/J is 
a field. Now R/T is also a commutative ring with unity / +e. Also, since J is a proper ideal 
of R, R/I has more than one element, so you can choose some J + a € R/I such that 
I+a#I+0, for which you must find an inverse in R//.Sincea ¢ I, Theorem 7.4.14 
and the maximality of / are just the things to help you do that. 

Now let’s put all this together in a very elegant construction. If K isa field, then K [tf] 
is an ED, hence a PID. If f € K[t] is irreducible (prime), then (f) is a prime ideal by 
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Theorem 7.6.16. By Theorem 7.6.8, (f) is also maximal. Therefore, K[t]/(f) is a field. 
We can use these facts to do the following: 


Example 7.9.8. Construct a field with nine elements. 


Solution: Since t? +1 isirreducible in Z3[t], Z3[t]/(t? +1) = {at +b : a,b € Zs} is 
a field with nine elements. For notational simplicity, we write at + b = (a, b) and 
illustrate multiplication in Table 7.1. Notice f? =(241) 2 and the manifestation of this 
in the table. Also, notice how the table reveals that every element has a multiplicative 
inverse. 


Table 7.1 Cayley table for multiplication in Q[t]/(t? + 1) 


x (0,0) (0,1) (0,2) d,0) d,D d,2) @,0) @,)D (2,2) 
(0,0) | (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) (0,0) 
(0,1) | (0,0) (0,1) ©,2) d,0) d,l) d,2) @,0) @,1) ©@,2) 
(0,2) | (0,0) (0,2) ©,1) @,0) (@,2) @,1) d,0) d,2) dD 
d,0) ); (0,0) €,0) @,0) ©,2) €d,2) @2) OD) dD @) 
d,1)) 0,0) d,1) @,2) d,2) @,0) OD @,1) (0,2) d,0) 
d,2) ) 0,0) d,2) @,1) @,2) 0,1) Gd,0) dd, (@,0) (0,2) 
(2,0) |} (0,0) (2,0) d,0) 0,0) @,1) Gd, ©,2) (@,2) d,2) 
(2,1) | 0,0) @,1) d,2) d,D 0,2) @,0) @,2) d,0) ©O,) 
(2,2) | (0,0) (@,2) Gd, @,) d,0) ©,2) d,2) (0,1 ©@,0) 


In Exercise 10, you'll construct a field with eight elements. 


EXERCISES 


. Finish the proof of Theorem 7.9.2 by showing that R/J has property R7. 
. Finish the proof of Theorem 7.9.3 by showing that ¢ behaves morphically with 


respect to multiplication. 


. Prove Theorem 7.9.4: Suppose R and S are rings and @: R — S is an epimorphism. 


Then S = R/Ker(@). 


. InQf[f], let f = t*+2r+4 1. Construct the form of elements of Q[t]/(/), and illustrate 


addition and multiplication. 


. In Z,[t], let f = t* + 2¢ + 1. Construct the form of elements of Z3[t]/(f), and 


illustrate addition and multiplication using the fact that 4 =;/) t + 2. 


. From our work in this section, Q[t]/(t? —2) = {at +b:a, b € Q}. Usea trick similar 


to that in Exercise 5 to simplify (at + b)(ct + d). 


10. 
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. In Exercise 4 from Section 7.3, you showed Q[ V2] is a field. Calculate and simplify 


(b + aV/2)(d + cV2), and compare to Exercise 6. 


. Prove Theorem 7.9.6: Suppose R is a commutative ring and J an ideal of R. Then 


R/T is an integral domain if and only if J is a prime ideal. 


. Prove one direction of Theorem 7.9.7: Suppose R is a commutative ring with unity e, 


and J is a maximal ideal of R. Then R/J isa field. 


Construct a field with eight elements, providing complete Cayley tables for addition 
and multiplication. 
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